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Preface 


The third edition of Business Statistics: Basic Concepts and Methodology retains 
the best features of the first two editions. It also incorporates new material that 
our own experience and that of other users of the previous editions indicate will 
make for a greatly improved text. Considerable polishing and rewriting of this 
third edition was done with both the student and the teacher in mind. Our objec¬ 
tives are (1) to make the subject matter clear and understandable to the student 
and (2) to provide the instructor with the most teachable text possible. 

Major additions and changes in this edition are as follows. 

1. Changes in topical coverage that have been made specifically in response to 
text users’ suggestions: 

• Analysis of variance. In this third edition, we extend our discussion of one¬ 
way analysis of variance to include the case of unequal sample sizes. We also 
include a discussion of the Tukey multiple-comparison procedure for unequal 
sample sizes. In addition, we illustrate the relationship between the t test for tv/o 
independent samples and one-way ANOVA. 

• Simple linear regression. We have changed the sequence of topics in the chap¬ 
ter on simple linear regression analysis, in order to clarify the concepts and com¬ 
putations of this important statistical technique. 

• Nonparametric statistics. We have added to the chapter on nonparametric sta¬ 
tistics discussions of the Wilcoxon signed-rank test for location and nonparametric 
regression analysis. We have replaced the median test with the Mann-Whitney 
test, which we believe is a generally more useful test. We believe that these 
changes will be of considerable benefit to the student. 

• Appendix tables. The tables have been revised so that the binomial, Poisson, 
t, F, x 2 , and normal distributions all now give the probability of obtaining a value 
of the test statistic as small as or smaller than the one calculated. This feature 
should make it easier for students to use the tables. 

• Time-series analysis and index numbers. We have expanded these topics by 
adding discussions of additional forecasting techniques and the Consumer Price 
Index. In addition, we present a general procedure for converting a secular trend 
equation from one time unit to another time unit. We also discuss in more detail 
the procedure for shifting of the origin. 

• Summation. We have expanded the explanation of summation notation in Ap¬ 
pendix III to cover the case of double subscripts. 

2. Computer applications. This third edition contains a greater emphasis on 
computer applications than previous editions. We discuss the advantages and ca¬ 
pabilities of computers in greater detail. We provide sample computer printouts 




as solutions to some of the problems and teach the student how to interpret the 
output. Many of the exercises are presented in the form of mini-cases, complete 
with “raw” data. (For example, see those at the end of Chapter 7.) These should 
provide realistic input for computer applications. 

In addition, a data base has been added in Appendix II, presenting data on 10 
variables for each member of a population of 1,000 fictitious heads of households. 
This provides instructors with the opportunity to simulate actual research projects. 

3. More exercises. We have added more than 100 new exercises to the present 
edition, bringing the total number of exercises in the text to well over 600. The 
new exercises appear at the ends of major sections within chapters, as well as at 
the ends of chapters. We have also added mini-cases, called Statistics at Work, 
at the ends of many chapters. In addition to the exercises, the book contains many 
study questions that appear at the ends of chapters. 

4. More illustrations. This third edition contains almost 50 percent more draw¬ 
ings than the second edition, a feature that we think will help readers understand 
the text material more readily. 

5. Student learning aids. For this third edition, two student learning aids are 
available: A study guide and a data base on computer tape. 

• Study Guide. The expanded Study Guide, with its programmed-instruction for¬ 
mat, enables students to check their ability to use computational techniques by 
comparing their step-by-step solutions on the right-hand side of a given page with 
step-by-step answers that appear on the left-hand side. To check their mastery of 
statistical concepts, students complete fill-in-the-blank questions. Again the an¬ 
swers are conveniently provided in the left-hand column. 

• Data Base. The data base that appears in the text’s Appendix II is also available 
on computer tape for use with SPSS and other statistical packages. The student 
can draw samples from the population for the purpose of computing descriptive 
measures, constructing confidence intervals, testing hypotheses, and using all the 
statistical techniques described in the text. 

We are deeply indebted to many people who have contributed to the production 
of this third edition of Business Statistics. First of all, we wish to thank Mary 
Daniel (Mrs. Wayne Daniel), who typed the various drafts of the manuscript and 
also helped proofread the book. 

We are especially grateful to the members of the faculty of the Department of 
Quantitative Methods at Georgia State University, who used the first and second 
editions in their classes and made invaluable suggestions for improvement. We 
are particularly grateful to Professors Brian Schott and Geoffrey Churchill who 
wrote the computer programs to produce the binomial, Poisson, and normal dis¬ 
tribution tables found in Appendix I; to Professor Ron Shiffler, who did the 
accuracy review of the manuscript; and to Professor Bill Thompson, who selected 
the material for the transparency masters. 

Several of our colleagues over the country, whose identities were unknown to 
us at the time, made valuable suggestions which we incorporated in this third 



edition. After their job was completed, we found to our satisfaction that they were 
Professors Ronald Klein, Columbus (Georgia) College; Mae Mulherin, Georgia 
College; William Haleman, University of Maine; John E. Ullman and Pat Ramsey, 
Hofstra University; Milton Mitchell, University of Wisconsin, Oshkosh; Joe No¬ 
sari, Florida State University; Jeff Green, Ball State University; Wallace Blischke, 
University of Southern California; and Vem Vincent, Pan American University. 
To all of them we offer our sincere appreciation. 

Special thanks go to the following people, who provided us with detailed cri¬ 
tiques of the second edition and recommendations for making improvements in 
the third edition: William Burrell, Wayne State University; Dorothy Jones, Stock- 
ton State University; Glenn Milligan, Ohio State University; William Sallas, Uni¬ 
versity of Houston; and Ron Shiftier, Georgia State University. 

In preparing this third edition of Business Statistics: Basic Concepts and Meth¬ 
odology, we have tried to make judicious use of this wealth of expertise. Blame 
for any remaining deficiencies, however, must rest on our own shoulders. 

W.W.D. 

J.C.T. 





1. The Role of Statistics 
in Decision Making 


Chapter Objectives: This chapter is concerned with the 
increasing complexity confronting the manager or busi¬ 
ness decision-maker in today's world. It discusses the 
role that statistics can play in the decision-making 
process. It also covers the basic principles and steps in¬ 
volved in planning and conducting statistical studies. 
After studying this chapter, you should be able to: 

1. Explain the major reasons for the increasing use of 
the scientific method and management information 
systems by business decision-makers and researchers 

2. Describe how statistics relates to business decision 
making 

3. Discuss the basic principles involved in conducting sta¬ 
tistical studies 

4. List steps that can help to ensure that statistical stud¬ 
ies are properly planned and conducted 



1.1 INTRODUCTION 


The further we move into the scientific age, the more complex our world becomes. 
Both our needs for information and the quantity of information available continue 
to expand rapidly. Managers or researchers in every field must plan carefully, so 
that the quantity and quality of information they obtain are adequate to meet their 
needs. Managers find the techniques and concepts of a management information 
system appropriate for this purpose. Researchers use the scientific method. 

A well-planned management information system enables a business firm to 
determine and examine its informational needs in perspective. That is, the firm 
can evaluate the importance of each need relative to the overall operation of the 
firm. To be effective, a management information system needs people skilled in 
a wide array of quantitative techniques. Even more important to its success is a 
quantitatively oriented management. 

The scientific method is characterized by objectivity, inductive reasoning, and 
a systematic explanation and measurement of facts. The accumulation of facts is 
followed by the formulation of concepts, hypotheses, and theory, all of which 
may be modified later as additional facts are collected. 

The ultimate objective of managers and researchers is to assemble information 
of sufficient quantity and quality to provide a basis for making sound decisions. 
In the use of both a management information system and the scientific method, 
the person trained in statistics can make an important contribution. 

1.2 THE ROLE OF STATISTICS IN DECISION MAKING 

Statistics may be described as the technology of the scientific method. It consists 
of a set of tools that are used to facilitate the making of decisions whenever 
conditions of uncertainty prevail. These tools are used in many fields other than 
business, for example, biology, medicine, agriculture, psychology, and education. 
Certain fields require special techniques. But the same basic principles and con¬ 
cepts apply to all fields. Note that statistics is a set of tools whose proper use 
helps in decision making. Only rarely should these tools be used as the sole basis 
for a decision. Statistics presents the decision-maker with relevant facts and, in 
many cases, provides an estimate of the probability of making a wrong decision. 
In the business world, the concepts, techniques, and results of statistics are in¬ 
dispensable components of decision making. 

The computer has greatly improved the ease and rapidity of using statistical 
methods. Because it makes large numbers of complex calculations in seconds, it 
has made commonplace the use of statistical methods that were previously im¬ 
practical. 

1.3 BASIC PRINCIPLES AND CONCEPTS OF SPECIAL STUDIES 

Much of the information that serves as the basis for decision making within a firm 
is generated routinely in everyday operations. On occasion, however, routinely 



available data do not provide an adequate foundation for an important decision. 
Then a firm has to obtain the needed information in a nonroutine manner. It may 
have to collect additional data or implement a special research project. In the 
discussion that follows, we refer to both nonroutine data-gathering projects and 
research projects as special studies, or simply studies. We assume a business 
context. 

In executing special studies, statisticians cannot merely apply statistical tech¬ 
niques. They must also be concerned with the appropriateness and quality of the 
data. The objectives of each study determine the data needed, the data quality 
needed, and the technique or combination of techniques to be used in analyzing 
the data. Since a study is conducted to fulfill certain objectives, it should be 
designed to meet those objectives as efficiently and effectively as possible. In 
view of the self-evident truth of this statement, you may wonder why so many 
studies fail to achieve their objectives. One reason is that most studies are more 
complex than they appear. Studies require several phases of planning as well as 
several phases of execution. Each phase must be handled thoroughly and in proper 
sequence if the study is to be effective. All phases are interrelated. Problems 
encountered during one phase often require changes in other phases. Thus, even 
a very carefully designed study may require extensive revisions when unforeseen 
problems arise. In fact, some studies must be abandoned because of difficulties 
that either are recognized in the planning phases or arise during execution. It’s 
better to recognize potential problems during the planning phases so that they can 
be handled by the study design or, if necessary, so that the study can be dropped. 

When we propose any study, we must answer two questions: (1) “Can it be of 
real value?” and (2) “Is it feasible?” If the answers to both are Yes, then we 
must decide whether the study is more desirable than alternative studies that may 
be equally appropriate. In determining the potential value of a proposed study, 
the criterion that we should use is the contribution that the study can make to the 
supply of data needed to meet the firm’s goals. In too many cases, studies have 
been conducted when it should have been apparent that even the most thoroughly 
planned and executed study would be of little value. 

How does one determine the value and feasibility of a study? The first step is 
to obtain a clear statement of the study objectives, supported by documentation 
showing why the study is needed and how the results will be used. At this point, 
we have answered the question of the potential value of the study. If the firm 
accepts the need for the study and its potential value, one must determine its 
feasibility and practicality. One must answer the following questions: (1) Is it 
logically possible to conduct the study so as to achieve its objective(s)? (2) Are 
required data available, or can they be obtained with reasonable effort? (3) Will 
the needed resources—personnel, equipment, and money—be available? (4) Will 
the study be of sufficient value to the firm to warrant using these resources? 

A thorough examination of proposed studies will ensure that we undertake sound 
studies and discard poor ones before the firm has invested significant resources in 
them. Not all proposed studies are worthwhile; the sooner poor study proposals 
are recognized, the better. Moreover, just because a study is undertaken does not 
necessarily mean that it should be followed through to completion. In spite of the 



most careful planning, problems may arise during the study that will prevent its 
objectives from being achieved fully. We must evaluate the effect of such prob¬ 
lems and try to salvage these studies. Often, we may still be able to achieve the 
original objectives, or revised ones. However, if we determine that the results 
that we can achieve will not be of sufficient value to the firm, we may be justified 
in discontinuing the study. 


1.4 STEPS INVOLVED IN PLANNING AND CONDUCTING 
SPECIAL STUDIES 


This section presents a sequence of steps to help transform a proposal into a well- 
designed and well-executed study. These steps apply the principles discussed in 
the preceding section. In fact, the steps that comprise the planning phase may 
prove useful in determining the feasibility of a proposed study. Think of these 
steps as a set of recommended procedures, not as inflexible rules. They are in¬ 
tended to meet the need for “planning before acting.” Following these steps will 
help you to achieve objectives with a minimum of effort. Yet they won’t keep 
you from investigating leads and hunches that might alter the dimensions of the 
study. 

We may view the planning and execution of a study as consisting of 10 steps, 
as shown in Figure 1.4.1. We may divide the steps into a planning phase and an 
accomplishment phase, each consisting of five steps. Alternatively, we may view 
them as five steps, each consisting of a planning phase and an accomplishment 
phase. 


FIGURE 1.4.1 
Flowchart for 
planning and 
executing a study 
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Ideally, we should complete the planning phase for all five steps before we 
begin the accomplishment phase. Planning is conducted in a step-by-step sequence 
from the statement of purpose to the specification of plans for data collection. The 
accomplishment phase is conducted in the reverse order, beginning with the col¬ 
lection of data. In each phase, each step is determined by the steps that precede 
it. Each step also helps determine the steps that follow. 

The ability to revise is an essential part of the planning phase. It is also an 
essential (although one hopes infrequently used) part of the accomplishment phase. 
For example, during the planning phase, you may find that you cannot meet the 
data needs of the study, and therefore that you cannot accomplish the planned 
analyses. In this case you must revise the planned analyses to use obtainable data 
that will still give the information needed. If you cannot make the necessary 
revision, you must either revise the specific objectives, devise some method of 
obtaining the required data, or drop the study. 

Studies that cannot be successfully carried out should be identified and dropped 
during the planning phase. It is a shame not to do studies that need to be done 
and can be done with a reasonable effort. However, it is perhaps even worse to 
conduct a study that is unsuccessful, simply because of poor planning. 

EXAMPLE 1.4.1 A large retail parts and service center has been having trouble with 
high inventory costs and frequent “stockouts” leading to customer complaints. 
The firm hires a specialist in inventory control as a consultant. After analyzing 
information on the quantities of each part ordered and used over the past several 
months, the specialist proposes a new and more sophisticated inventory system. 
The stock manager is receptive to the new system, but wants to conduct a study 
that will make possible a comparison of the two systems, and perhaps provide 
documentation that the new system is better. 

The stock manager and inventory specialist agree that the study should be 
carefully planned to ensure that the desired comparisons can be effectively made. 
They agree that the general objective of the study is to determine which of the 
two inventory systems is better. They state measurable specific objectives in terms 
of comparisons of costs, the number of stockouts, and the number of customer 
complaints. They then plan the analyses necessary to achieve the specific objec¬ 
tives. Next they determine the actual data items needed and make plans to collect 
the necessary data. 

If their comparisons of the two systems are to be valid, they must collect 
comparable data for both systems. In this instance the data needed may already 
be available for the current system. If so, they can simply implement the new 
system and collect the needed data for it. However, if the data are not already 
available for the current system, the researchers must continue to use it until they 
obtain the data needed. Then they can implement the new system. 

In the chapters that follow we will introduce a number of basic statistical tech¬ 
niques and concepts. We hope that you will gain enough mastery of the material 
so that, when you are in a decision-making situation requiring a knowledge of 
statistics, you will be able to make a positive contribution. If this contribution 



consists of no more than recognizing that the problem requires a higher level of 
statistical expertise than you possess, the time you spend studying this text will 
not have been wasted. 


Summary 


This chapter was concerned with the business firm’s increased need for quality 
information. Three sources of information were identified: (1) the routine operation 
of the firm, (2) special data-gathering projects, and (3) special research projects. 
Sources (2) and (3) were referred to as special studies. A step-by-step procedure 
for conducting a special study was suggested. The importance of statistics to the 
manager or researcher seeking to meet a firm’s informational needs was empha¬ 
sized. 

There is available a wealth of material that delves deeper into the general 
considerations and more philosophical aspects of special studies. The following 
books are a sampling of those. 

A general treatment of the scientific method is given by Walker (1963), Ackoff 
(1962), and Kerlinger (1973). Business research in general is the subject of the 
books by Ferber and Verdoorn (1962), Nemmers and Myers (1966), Roberts 
(1964), Murdick (1969), and Rummel and Ballaine (1963). For a discussion of 
research in the field of marketing, see Green and Tull (1978). For coverage of 
research in the area of production management, see the books by Gedye (1965), 
Johnson et al. (1972), Starr (1971), Levin et al. (1972), and Buffa (1977). 

The chapters that follow are concerned primarily with two areas: (1) the analysis 
of data resulting from special studies and routine operations, and (2) the concepts 
on which these analyses are based. 


2. Organizing and 
Summarizing Data 


Chapter Objectives: This chapter teaches you some of 
the basic techniques used in describing and summarizing 
important characteristics of a set of data. It will help you 
to understand and be able to use these techniques. These 
skills are essential for handling much of the material in 
the remainder of this text. After studying this chapter 
and working the exercises, you should be able to: 

1. Use some basic vocabulary necessary for understand¬ 
ing statistics 

2. Organize and summarize data so that they can be bet¬ 
ter understood 

3. Communicate, by means of graphs, the important in¬ 
formation contained in a set of data 

4. Compute numerical quantities that measure the cen¬ 
tral tendency and dispersion of a set of data 



2.1 INTRODUCTION 


Computers and 
Business Statistics 


Using Computers 
with This Text 


We may conveniently present the concepts and techniques of applied statistics 
under two broad headings: descriptive statistics and inferential statistics. Under 
the heading of descriptive statistics, we examine ways of organizing, summariz¬ 
ing, and presenting statistical data. Under the heading of inferential statistics, we 
deal with the concepts and techniques involved in reaching conclusions (making 
inferences) about a body of data when we examine only part of the data. This 
chapter introduces the more important methods and concepts of descriptive statis¬ 
tics. The next three chapters will present basic concepts necessary for understand¬ 
ing statistical inference, the subject of most of the rest of the book. 

The relatively recent widespread use of computers has had a tremendous impact 
on business research in general and statistical analysis in particular. The computer 
has greatly reduced the need for laborious hand calculations. Computers can per¬ 
form more calculations faster—and far more accurately—than can humans. Through 
efficient use of computers, business researchers can now devote more time to 
improving the quality of raw data and interpreting the results. 

There are canned computer programs available that perform most of the des¬ 
criptive and inferential statistical procedures that the business researcher is likely 
to need. Some widely used “packages” of statistical procedures are BMDP: 
Biomedical Computer Programs [Dixon and Brown, 1979]; SPSS Statistical Pack¬ 
age for the Social Sciences [Nie et al., 1975]; The IMSL Library (1979); Minitab 
[Ryan et al., 1976]; and SAS [Barr, et al., 1979]. Dixon and Jennrich (1972), in 
a review article, describe 38 different packaged statistical programs. They give 
facts about the machines on which the programs may be run, core memory re¬ 
quirements, program languages, available documentation, and sources of addi¬ 
tional information. 

The American Statistician regularly features a special section on statistical com¬ 
puting. This section gives (1) summaries of selected committee reports dealing 
with statistical computing, (2) announcements of new and/or newly updated pro¬ 
gram packages, (3) sources of further information, and (4) announcements and 
reviews of new computing products of interest to statisticians. Articles of general 
interest from the series include those by Francis, et al. (1975), Thisted (1979), 
Muller (1980), and Chambers (1980). 

Statistical programs differ with respect to their input requirements, their output 
formats, and the specific calculations they perform. If you wish to use the com¬ 
puter to obtain solutions to the exercises in this book, you should become familiar 
with the programs available at your computer installation. You must determine, 
first of all, whether there is an existing program that will do the required calcu¬ 
lations. Once you locate an appropriate program, study its input requirements so 
that you can enter the data of the exercises into the computer correctly. Finally, 




study the program’s output format to ensure that you will interpret the results 
properly. If you have studied a computer language, you may, in some instances, 
wish to write your own computer programs for use with the exercises. 

The programs in statistical program packages can perform the calculations for 
many of the exercises in this book. In particular, the computer is a useful tool for 
calculating descriptive measures and constructing various distributions from large 
sets of data. 

This third edition of our text puts a greater emphasis on computer applications 
than previous editions. Appendix II presents a “population” of 1000 fictitious 
heads of households. For each one, we give recorded values of ten variables: sex, 
marital status, age, occupation, education, commuting distance to work, number 
of years with current employer, annual income, size of family, and size of resi¬ 
dence. This data base offers you many chances to simulate actual research projects. 

Before we discuss descriptive statistics, let us define some terms that will help 
you to understand that subject. 


2.2 SOME BASIC VOCABULARY 

In this section we shall define some basic terms that you will use later. 

Entity When statisticians make observations about persons, places, and things, 
they call that which is being observed an entity, regardless of the type of unit 
involved. 

Variable A characteristic that assumes different values for different entities is 
called a variable. By contrast, a characteristic that retains the same value from 
entity to entity is called a constant. Examples of variables are heights of adult 
Army volunteers, number of customers entering a store each day, and the color 
of people’s eyes. The different values of a variable that one observes (or measures) 
are called observations. 

Random Variable If one can specify, for a given variable, a mathematical expres¬ 
sion, called & function, that gives the relative frequency of occurrence of the values 
that the variable can assume, the function is called a probability function and the 
variable is called a random variable. The value that a random variable assumes 
in a given situation is thought of as arising from chance factors. The term variate 
is frequently used as a synonym for random variable. Although this is not a 
rigorous definition of random variable, it suffices for our purposes here. 

Quantitative Variable A quantitative variable is one whose values are expressible 
as numerical quantities, such as measurements or counts. Height, which is a 
measurement, and number of customers, which is a count, are examples of quan¬ 
titative variables. 



Qualitative Variable A qualitative variable is one that is not measurable, in the 
sense that height is measured, or countable, as are people entering a store. Many 
characteristics can be classified only. Examples are designating items of a firm’s 
output as defective or not defective, or saying that the color of a person’s eyes is 
blue, green, or brown. Such a variable is a qualitative variable. 

Discrete Variable A discrete variable is one that can assume only certain values 
within an interval. The number of customers that enter a store on a given day is 
an example of a discrete variable, since we cannot speak meaningfully of 1.5 
customers or 2.78 customers. A discrete variable is characterized by “interrup¬ 
tions” between the values that the variable can assume. 

Continuous Variable The interruptions, or gaps, that are characteristic of a dis¬ 
crete variable do not occur with a continuous variable. There is a continuum of 
values that a continuous variable can assume—all the whole numbers and all 
values in between. Height, for example, is a continous variable, since people do 
not come in heights expressed by only certain values. One person may be 72.12341 
inches tall (assuming that there is a measuring device that will give such a precise 
reading). Another person may be slightly taller, and the measuring device may 
show this person to be 72.12345 inches tall. No matter how close together two 
people’s heights may be, it is always possible, theoretically, to find another person 
whose height is somewhere in between. 

Population The largest collection of values of some variable in which there is 
interest constitutes the population of these values. If the heights of all college 
students in the United States are of interest, the population consists of all these 
heights. If interest does not extend beyond, say, a particular classroom of college 
students, the number of people involved has decreased considerably, but we are 
still talking about a population, because we are interested in the heights of the 
students in only this classroom. Thus a population is created merely by defining 
the collection of values of interest. 

The word population can also refer to a collection of entities. It is often better 
to refer to entities (such as persons) rather than to the measurements (values) taken 
on these entities. In any case, the ultimate interest is always in the measurements 
taken on the entities, not in the entities themselves. It is always clear from the 
context whether reference is to a collection of entities or to a collection of nu¬ 
merical values. 

Sample A sample is a part of a population. If the population is defined as the 
heights of all college students in the United States, the heights of students in a 
college classroom in Michigan would constitute a sample from that population. 

Random Sample A random sample is a sample drawn in such a way that the 
results of an analysis of it may be used to make inferences about the population 
from which it was drawn. There are many methods of selecting a sample from a 




population, but not all of them yield samples that provide a good basis for making 
inferences about the population from which the sample was drawn. Consider, for 
example, the population of all students enrolled at a college during a given term. 
A particular class—say a class in statistics—would be a sample from this popu¬ 
lation. Do the students in the class provide a sample that is suitable for making 
an inference about all students enrolled at the college? 

A more technical definition of random sample will be given later. The methods 
of selecting the sample, analyzing it, and drawing conclusions about the population 
form the essence of this book and will be dealt with in detail in succeeding 
sections. 


2.3 SUMMARIZING DATA: THE ORDERED ARRAY 

Data from a special study or from routine business records are usually available 
to a researcher or manager as an unorganized mass of observations. 

If the number of observations is not too great, a frequent first step in organizing 
the data is the preparation of an ordered array. An ordered array is a list of the 
observations in order of increasing magnitude from the smallest value to the 
largest. By looking at an ordered array, the researcher can get a feel for the 
magnitude of the observations. If more calculations and further organization of 
the data have to be done with pencil, paper, and a calculator, these operations 
will be much easier if an ordered array is first prepared. On the other hand, if all 
calculations are done by a computer, preparation of an ordered array may not be 
desirable. 

EXAMPLE 2.3.1 A business firm wants to analyze the characteristics of its employ¬ 
ees (entities). A sample of 100 employees is selected, and the age at nearest 
birthday of each is determined. The ages are obtained from individual employee 
records filed alphabetically in the personnel department. Table 2.3.1 shows the 
ages as recorded. Table 2.3.2 shows the ordered array that is prepared from the 
original list. According to this ordered array, the youngest employee in the sample 
is 17 and the oldest is 63. This information is hidden in Table 2.3.1. The ordered 
array also facilitates the tabular presentation of data, as we’ll see in the next 
section. 


TABLE 2.3.1 

Ages of 100 

60 

63 

39 

22 

23 

32 

30 

52 

29 

46 

26 

35 

29 

25 

41 

28 

40 

33 

32 

33 

employees 

20 

25 

42 

34 

29 

43 

41 

31 

30 

36 


58 

21 

24 

55 

51 

28 

18 

40 

44 

38 


32 

21 

30 

31 

25 

49 

31 

26 

33 

36 


43 

34 

35 

22 

33 

38 

34 

34 

33 

34 


23 

26 

57 

23 

26 

36 

39 

31 

35 

34 


34 

51 

40 

50 

35 

45 

28 

36 

32 

39 


26 

48 

17 

45 

45 

25 

25 

30 

36 

30 


43 

25 

27 

21 

53 

25 

38 

33 

37 

33 



TABLE 2.3.2 
Ordered array 
prepared from 100 
ages in Table 2.3.1 


17 

23 

26 

29 

32 

33 

35 

38 

43 

50 

18 

24 

26 

30 

32 

34 

35 

39 

43 

51 

20 

25 

26 

30 

32 

34 

36 

39 

43 

51 

21 

25 

26 

30 

32 

34 

36 

39 

44 

52 

21 

25 

27 

30 

33 

34 

36 

40 

45 

53 

21 

25 

28 

30 

33 

34 

36 

40 

45 

55 

22 

25 

28 

31 

33 

34 

36 

40 

45 

57 

22 

25 

28 

31 

33 

34 

37 

41 

46 

58 

23 

25 

29 

31 

33 

35 

38 

41 

48 

60 

23 

26 

29 

31 

33 

35 

38 

42 

49 

63 


2.4 SUMMARIZING DATA: THE FREQUENCY DISTRIBUTION 

Although the ordered array helps to convey the information contained in a set of 
data, it is hard to grasp a large number of observations, even though they are 
ordered according to magnitude. We can summarize further by grouping the data 
into class intervals. 

Class intervals are contiguous, nonoverlapping intervals selected in such a way 
that they are mutually exclusive and exhaustive. That is, each and every value 
in the set of data can be placed in one, and only one, of the intervals. 

For example, you can summarize the individual incomes of a group of employees 
by showing the number falling into each of several class intervals, such as 

< $5,000 
5,000- 9,999 
10,000-14,999 
15,000-19,999 
20,000-24,999 
25,000-29,999 
30,000 and over 


After we have determined class intervals, we examine the data and count the 
number of values falling into each interval. The result is a frequency distribution. 
It can be displayed as either a table or a graph. We can define it as follows: 

A frequency distribution is any device, such as a graph or table, that displays 
the values that a variable can assume along with the frequency of occurrence 
of these values, either individually or as they are grouped into a set of mutually 
exclusive and exhaustive intervals. 

One of the first things to consider when data are to be grouped is the number 
of intervals to include. Using too few intervals results in an excessive loss of 
information. Using too many defeats the purpose of summarization. You usually 
should not use fewer tham6 intervals or more than 15. When deciding how many 
class intervals to have, you need to be familiar with the data and to understand 
the purposes of grouping. 



Those who wish more specific guidelines for determining the number of class 
intervals may use a formula given by Sturges (1926). If we let k equal the number 
of class intervals and n equal the number of observations, Sturges ’ rule tells us 
that the number of class intervals should be 

k = 1 + 3.322(log, 0 n) ( 2 . 4 . 1 ) 

We should not regard the number of class intervals indicated by Sturges’ rule 
as final. The actual number of class intervals we use may be more or less than 
the number k obtained by the formula if this will make for greater convenience 
and clarity. Suppose, for example, that we wish to construct a frequency distri¬ 
bution from 150 observations. Application of Sturges’ rule yields 

k = 1 + 3.322(log| 0 150) - 1 + 3.322(2.1761) ^ 8 

Another decision to be made when grouping data concerns the width of the 
intervals. As a general rule, all the intervals should be the same width. We should 
also select a width that is convenient to work with. 

We may approximate the width of the class interval by dividing the range by 
k. The range is the difference between the largest and the smallest value in a set 
of data. Let R be the range. The approximate width of the class interval is given 
by R/k. The class-interval width determined in this manner is often not an integer 
and must be rounded up or down. Also frequently R/k yields a class-interval width 
that is undesirable because it is inconvenient to work with or because it is one 
that is not customarily used with the data under consideration. Class-interval 
widths of 5 units, 10 units, or some multiple of 10 units are desirable, since people 
can grasp them more readily. 


EXAMPLE 2.4.1 To understand how to group data, consider the employee ages in 
Table 2.3.2. An examination of these data indicates that 5-year intervals, begin¬ 
ning with the interval 15 through 19, would adequately summarize the data. 

Now let us see how closely the results obtained by applying Sturges’ rule agree 
with our subjective judgment. By Sturges’ rule, we have 

k = 1 + 3.322(log j0 100) = 1 + 3.322(2) - 8 
From Table 2.3.2 we see that R = 63 - 17 = 46, so that 


Since we prefer intervals whose widths are 5 units or some multiple of 10 units, 
we have a choice here between using 5-year intervals and using 10-year intervals. 
If we were to use interval widths of 10 years, we would have only 5 class intervals, 
one fewer than the recommended minimum of 6. Hence 5-year intervals seem 
best here. 

Specifying the intervals as suggested, and counting the number of observations 
that fall into each, gives the frequency distribution shown in Table 2.4.1. This 




TABLE 2.4.1 
Frequency 
distribution of 
the ages of 
100 employees 


Relative 

Frequencies 


Age (in years) 

Frequency 

15-19 

2 

20-24 

10 

25-29 

19 

30-34 

27 

35-39 

16 

40-44 

10 

45-49 

6 

50-54 

5 

55-59 

3 

60-64 

2 

Total 

100 


table enables us to ascertain, at a glance, various features of the data. For example, 
more employees are in the age group 30-34 than in any other group. The number 
in each group decreases in both directions from this interval. In Table 2.4.1 the 
numbers 15, 20, 25, 30, and so on, are the lower class limits. The numbers 19, 
24, 29, 34, and so on, are the upper class limits. These numbers determine the 
magnitude of the observations that go into a given interval. 

The choice of class limits reflects the extent to which the values being grouped 
are rounded off. The employee ages in the present example are rounded to the 
nearest year, since it was the age at the nearest birthday that was recorded. An 
employee between 24 and 24.5 would be counted in the second class interval, 
whereas one who is between 24.5 and 25 would be counted in the third. Thus 
24.5 is really the boundary between the second and third class intervals. Similar 
boundaries between the other class intervals may be determined. These are some¬ 
times referred to as the class boundaries or true class limits. For the employee 
ages, they are 

14.5- 19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 

39.5- 44.5 44.5-49.5 49.5-54.5 54.5-59.5 59.5-64.5 

Sometimes one wants a cumulative frequency distribution. Table 2.4.2 shows 
this for the 100 employee ages. We obtain the entries in the cumulative frequency 
column by adding the number of observations in a given interval to the cumulated 
number of observations from the first interval through the preceding interval, 
inclusive. The cumulative frequency distribution tells us quickly how many ob¬ 
servations are below a certain value. For example, Table 2.4.2 shows that 74 
employees are under 39.5 years old. 

We may at times wish to know what proportion of the observations under study 
fall within a certain class interval. We find this by dividing the number of values 
in that class interval by the total number of values. In Example 2.4.1, we find 
the proportion of observations in the class interval 15-19 by dividing 2 by 100. 
That is, 2 -4- 100 = 0.02. We refer to this as the relative frequency of occurrence 
of observations in that interval. 





TABLE 2.4.2 
Cumulative 
frequency 
distribution of 
the ages of 
100 employees 


Effects of Too Few 
or Too Many Class 
Intervals 


Unequal Class 
Intervals 


TABLE 2.4.3 
Relative frequency 
and cumulative 
relative frequency 
distributions of 
ages of 100 
employees 


Age (in years) 

Frequency 

Cumulative frequency 

15-19 

2 

2 

20-24 

10 

12 

25-29 

19 

31 

30-34 

27 

58 

35-39 

16 

74 

40-44 

10 

84 

45-49 

6 

90 

50-54 

5 

95 

55-59 

3 

98 

60-64 

2 

100 

Total 

100 



Just as we may construct cumulative frequency distributions, we may also 
construct cumulative relative frequency distributions. We can obtain cumulative 
relative frequencies in one of two ways: We can cumulate individual relative 
frequencies, or we can divide cumulative frequencies by the total number of 
observations. Table 2.4.3 shows the relative frequency and cumulative relative 
frequency distributions for Example 2.4.1. 

Let us now illustrate the effects of having too few or too many class intervals. 
Suppose that, in Example 2.4.1, we had used class-interval widths of 20. Table 
2.4.4 shows the resulting frequency distribution. We can see that using only three 
class intervals results in too much loss of detail. 

Now consider the same example, this time using three-year class intervals. 
Table 2.4.5 shows the results of using too many class intervals. The frequency 
distribution in Table 2.4.5 does not condense the original data enough to make 
clear the information they contain. 

As noted earlier, all class intervals for a frequency distribution should usually be 
of the same width. Sometimes, however, it may be impossible or undesirable to 


Age (in years) 

Relative 

frequency 

Cumulative 

relative 

frequency 

15-19 

0.02 

0.02 

20-24 

0.10 

0.12 

25-29 

0.19 

0.31 

30-34 

0.27 

0.58 

35-39 

0.16 

0.74 

40-44 

0.10 

0.84 

45-49 

0.06 

0.90 

50-54 

0.05 

0.95 

55-59 

0.03 

0.98 

60-64 

0.02 

1.00 

Total 

1.00 





TABLE 2.4.4 
Frequency 
distribution of ages 
of 100 employees 
using 20-year class 
intervals 


Histogram 


TABLE 2.4.5 
Frequency 
distribution of ages 
of 100 employees 
using three-year 
class intervals 


Age (years) 15-34 35-54 55-74 

Frequency 58 37 5 Total: 100 


have class intervals of equal width. Unequal class intervals are preferred, for 
example, when there are one or two extremely small or extremely large values in 
the set of data. In such cases, we may use an initial class interval labeled “less 
than . . or a terminal class interval labeled “greater than. ...” This avoids 
one or more equal class intervals containing zero frequencies. The disadvantage 
of such open-end class intervals is that there is no way of knowing their true 
widths unless we use some special notation to convey this information. In some 
instances we may need to use unequal class intervals at places other than at the 
ends of a distribution to better communicate the true nature of the data. 


2.5 SUMMARIZING DATA: THE HISTOGRAM AND FREQUENCY 
POLYGON 

A frequency distribution may be portrayed graphically. This method of repre¬ 
senting data has the usual advantages of graphical presentations. We can see the 
salient features of the data without having to interpret a column of numbers. 

One way of graphically representing a frequency distribution or a relative fre¬ 
quency distribution is by means of a histogram. In a histogram, we plot the 
variable under consideration on the horizontal axis, and the frequency (or relative 
frequency) on the vertical axis. We locate the class intervals on the horizontal 
axis, and above each we erect a vertical bar, or cell. The height of a bar corre¬ 
sponds to the frequency (or relative frequency) of observations in the class interval 
above which it is erected. We also make the adjacent cells of a histogram contig¬ 
uous. 


Age (years) Frequency Age (years) 


Frequency 


15-17 

1 

42-44 

5 

18-20 

2 

45-47 

4 

21-23 

8 

48-50 

3 

24-26 

13 

51-53 

4 

27-29 

7 

54-56 

1 

30-32 

13 

57-59 

2 

33-35 

18 

60-62 

1 

36-38 

9 

63-65 

1 

39-41 

8 

Total: 

100 


100 


FIGURE 2.5.1 
Histogram of ages 
of 100 employees 



Figure 2.5.1 shows the histogram for the data in Table 2.4.1. Since there are 
two ages in the 15-19 interval, the height of the cell for that interval is two units. 
The next cell is 10 units high, since there are 10 ages in the interval 20-24. The 
lower limits of the intervals show, on the horizontal axis, the points of separation 
between adjacent cells. 

We may also use the true class limits to label the horizontal axis of a histogram. 
However, we may find it more meaningful to use the lower class limits (as in 
Figure 2.5.1), the upper class limits, or both. 

Frequency Polygon An alternative kind of graph for a frequency distribution is the frequency polygon. 

To construct this graph, we place a dot above the center of each class interval at 
a height corresponding to the frequency for that interval. We then connect the 
dots with straight lines. 

You can make a frequency polygon touch the horizontal axis at both ends by 
extending it to the center of an imaginary class interval at each end. Figure 2.5.2 
shows a frequency polygon for the data in Table 2.4.1 superimposed over the 
corresponding histogram. This figure illustrates the relationship between these two 
graphic devices. Generally, the two graphs are not shown together in this manner. 
They are shown separately when both are desired, or alone when only one is 
wanted. Figure 2.5.3 shows, by itself, the frequency polygon for the frequency 
distribution of Table 2.4.1. 


FIGURE 2.5.2 
Frequency polygon 
and histogram of 
ages of 100 
employees 
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FIGURE 2.5.3 
Frequency polygon 
of ages of 100 
employees 


Ogive 


FIGURE 2.5.4 
Ogive for 
cumulative 
frequency 
distribution of 
Table 2.4.2 



Note: We did not use true class limits to label the horizontal axis in Figures 
2.5.1 and 2.5.2. Therefore these graphs are shifted one-half unit to the right. For 
example, the true limits for the first interval are 14.5-19.5, whereas the figures 
use the limits 15-20. 

Graphs of cumulative frequency distributions often help to describe the nature of 
data under analysis. This type of graph, which resembles a frequency polygon, is 
called an ogive. To construct an ogive, we place a dot above each lower class 
limit on the horizontal axis at a height corresponding to the cumulative frequency 
through the previous interval. We then connect these dots by straight lines. Any 
point on the ogive represents the number of observations that are less than the 
value directly below it on the horizontal axis. Figure 2.5.4 shows the ogive for 
the cumulative frequency distribution of Table 2.4.2. 






Other ways of presenting frequency distributions graphically are described in 
books devoted exclusively to the construction of graphs, such as those by Schmid 
and Schmid (1979), Smart and Arnold (1951), and Spear (1969). 

2.5.1 The following figures arc the number of miles (rounded to the nearest 1000) driven 
during a certain year by 110 salespeople. Prepare an ordered array, a frequency distribution, 
a cumulative frequency distribution, a histogram, and a frequency polygon for these data. 


40 

26 

41 

40 

39 

34 

61 

42 

47 

23 

18 

43 

29 

93 

46 

32 

44 

71 

45 

62 

36 

22 

49 

31 

35 

36 

84 

81 

51 

51 

52 

66 

34 

55 

44 

18 

33 

38 

28 

42 

11 

48 

55 

42 

65 

54 

97 

67 

88 

44 

39 

42 

35 

50 

90 

73 

60 

41 

40 

29 

24 

58 

47 

53 

45 

84 

30 

31 

32 

34 

48 

76 

38 

52 

63 

41 

73 

36 

50 

31 

56 

35 

15 

26 

28 

41 

45 

61 

32 

27 

75 

30 

68 

24 

37 

30 

20 

50 

52 

10 

65 

52 

20 

36 

38 

38 

43 

21 

55 

48 


2.5.2 The following is an ordered array of the amounts of money (in millions of dollars) 
on deposit in each of 100 banks on a certain date. Prepare a frequency distribution, a 
cumulative frequency distribution, a histogram, and a frequency polygon for these data. 


0.9 

2.3 

5.0 

6.0 

8.6 

10.3 

13.7 

16.1 

21.3 

27.5 

1.1 

2.4 

5.1 

6.1 

8.7 

10.5 

13.9 

16.4 

21.2 

28.3 

1.5 

2.5 

5.2 

6.1 

8.8 

11.1 

14.0 

16.3 

22.4 

28.3 

1.7 

2.7 

5.2 

6.5 

9.3 

11.2 

14.2 

17.2 

23.6 

29.0 

1.8 

3.0 

5.4 

6.8 

9.4 

11.8 

14.4 

17.1 

23.8 

29.4 

1.9 

3.2 

5.5 

6.9 

9.5 

12.1 

14.5 

18.8 

24.0 

30.0 

1.9 

3.7 

5.6 

7.1 

9.6 

12.2 

14.6 

19.0 

24.2 

30.1 

2.0 

4.2 

5.7 

7.3 

9.8 

13.5 

15.1 

19.5 

25.4 

30.5 

2.0 

4.6 

5.8 

7.4 

9.8 

13.6 

15.3 

20.4 

25.2 

33.2 

2.1 

4.9 

5.8 

8.2 

10.1 

13.6 

15.6 

20.5 

26.2 

34.4 


2.5.3 As part of its screening process in hiring new assembly-line employees, a company 
gives each applicant an aptitude test. The following are the scores made by the last 100 
applicants. Prepare a frequency distribution, a cumulative frequency distribution, a his¬ 
togram, and a frequency polygon for these data. 


49 

86 

40 

45 

48 

93 

97 

58 

58 

98 

58 

82 

52 

56 

50 

85 

80 

60 

62 

80 

62 

72 

65 

60 

64 

70 

78 

67 

69 

88 

60 

72 

66 

66 

65 

75 

78 

62 

64 

74 

68 

72 

67 

61 

62 

72 

79 

71 

74 

73 

76 

69 

73 

78 

73 

78 

78 

74 

73 

69 

76 

65 

74 

75 

78 

60 

62 

72 

74 

72 

70 

66 

77 

78 

77 

64 

65 

77 

82 

61 

88 

51 

87 

84 

84 

54 

50 

82 

88 

65 

81 

46 

87 

83 

94 

41 

49 

90 

98 

52 


2.5,4 In a study of the history of small business firms in a certain area, researchers collect 
data on the length of time (in years) that 120 such firms existed before going out of 
business. The results are shown in the following table, (a) Construct a frequency distri¬ 
bution, a relative frequency distribution, a cumulative relative frequency distribution, and 
a histogram from these data, (b) Construct an ogive from the data. 



3 

4 

4 

3 

8 

15 

5 

10 

15 

25 

4 

6 

3 

5 

1 

4 

7 

15 

1 

10 

14 

23 

8 

2 

5 

4 

4 

1 

6 

11 

1 

10 

14 

23 

11 

25 

1 

5 

4 

5 

8 

11 

1 

10 

15 

21 

16 

24 

3 

5 

2 

4 
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2.5.5 The following are the lengths of service, in years, of 137 employees of a certain 
firm. Only employees with 10 or more years of service are included. From these data, 
construct the following: (a) a frequency distribution, (b) a cumulative frequency distri¬ 
bution, (c) a relative frequency distribution, (d) a cumulative relative frequency distribu¬ 
tion, (e) a histogram, (f) an ogive. 
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2.6 SUMMARIZING DATA: DESCRIPTIVE MEASURES 

In addition to tabular and graphical methods of summarizing data, one also finds 
it useful to summarize data by methods that lead to numerical results, called 
descriptive measures. We shall discuss two types of descriptive measures: measures 
of central tendency and measures of dispersion. They may be computed from the 
data contained in a sample or from the data comprised by a finite population. In 
order to distinguish between the two types on the basis of whether they refer to 
a sample or a population, we use the following definitions: 

A descriptive measure computed from or used to describe a sample of data is 
called a statistic. 

A descriptive measure computed from or used to describe a population of data 
is called a parameter. 

Always be aware of the difference between a population and a sample. As 
stated earlier, a population is the largest collection of observations for which we 
have an interest in a given situation. Frequently it is impractical to analyze an 
entire population, because of its size or for some other reason. Instead, we examine 





Measures of 
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a part of the population. This part of the population is called a sample. Throughout 
this text we will use different symbols to distinguish between descriptive measures 
that relate to a population and those that relate to a sample. Later you will learn 
how to reach decisions about population characteristics on the basis of an analysis 
of a sample drawn from that population. 

Even when you draw a collection of data from a common source, individual 
observations are not likely to have the same value. It is impractical to keep in 
mind all the values that may be present in a set of data. What we need is some 
single value that we may consider typical of the set of data as a whole. The need 
for such a single value is usually met by one of the three measures of central 
tendency: the arithmetic mean , the median, and the mode. 

The Arithmetic Mean The most familiar measure of central tendency is the arith¬ 
metic mean. Popularly known as the average, it is sometimes called the arithmetic 
average, or simply the mean. We find it by adding all the values in a set of data 
and dividing the total by the number of values that were summed. 

EXAMPLE 2.6.1 A bus company uses extra drivers to handle demands for service 
beyond its routine schedule. A sample of five extra drivers drove the following 
number of hours during a certain week. 


Driver ABODE 

Hours driven 17 28 35 42 45 


To find the mean number of hours driven by this sample of drivers, add the five 
numbers showing the hours driven and divide by 5. Thus we have 

„ , sum of all values in the sample 

Sample mean — ---——^— £ — (2.6.1) 

number of values in the sample 

For the present example, we have 


Mean hours driven 


17 + 28 + 35 + 42 + 45 
5 " . 


167 

5 


33.4 


We can do this in a more compact form by using the capital letter X to designate 
the variable of interest, which here is the hours driven by extra drivers during a 
week. Particular values of this variable may be represented by lower-case letters 
as follows: x x , x 2 , , x„, where the subscripts refer to the location of the value 

in the sequence of data. For example, x x refers to the first value, x 2 to the second 
value, and so on, to x,. n which represents the last value in a set of sample data. 
For the present example, x x = 17, x 2 = 28, x 3 = 35, x 4 = 42, and x 5 = 45. 
Note that the subscript n also indicates the size of the sample. If x denotes the 
mean of the sample, we can write Equation 2.6.1 as 


( 2 . 6 . 2 ) 





where the symbol 


2, (“summation from / = 1 to f = n”) 

/= 1 

tells us to add the values of X from the first to the last. The subscript i on the x 
following 2 indicates a typical value from the series of values under study. From 
now on, we will omit the i = 1 and the n when it is clear from the context what 
they should be. We can write the formula for the sample mean, then, as 


A more complete discussion of summation notation is given in Appendix III. 

On occasion, we may have some finite population of values for which we want 
to compute the mean. The procedure for calculating the population mean is exactly 
the same as that for calculating the sample mean. 

To distinguish a sample mean from a population mean, we designate the pop¬ 
ulation mean by the Greek letter /x (pronounced “mu”), and use the letter N to 
indicate the size of a finite population. Thus the formula for calculating the mean 
of a finite population is given by 

yv 


or simply 


You can think of the mean as the balance point of a set of data. Think of the 
number line as a balance bar and the different values in the data set as cubes of 
equal weight. If you place each of the “cubes” on the “balance bar” at a position 
corresponding to its numerical value, and if you place a fulcrum at a point on the 
balance bar corresponding to the numerical value of the mean, the bar will be in 
perfect balance. 


EXAMPLE 2.6.2 Suppose that we have the following sample of values: 1, 1, 2, 5, 
2, 2, 8. By Equation 2.6.3, we find that 

_ = 1 + 1+ 24-5 + 2 + 2 + 8 _ 21 _ 

* ” 7 7 

Figure 2.6.1 illustrates the concept that the mean is the balance point for the data. 
We can see here that the sum of the distances of the observations to the left of 
the mean plus the sum of the distances of the observations to the right of the mean 
equals zero. That is, 

[3( — 1) + 2( — 2)] + [ 1 (+ 2) + 1 (+ 5)] = -7 + ( + 7) = 0 


FIGURE 2.6.1 
The mean is the 
balance point of a 
data set. 
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This demonstrates another property of the arithmetic mean: The sum of the devia¬ 
tions (x t — x) of a set of observations about their mean is equal to zero. We may 
express this property symbolically as follows: 

2(jc, - x ) = 0 

For our present example, we have 

2(x,. - x) = (1 - 3) + (1 - 3) + (2 - 3) + (5 - 3) + (2 - 3) 

+ (2 - 3) + (8 - 3) 

= (-2) + (-2) + (-1) + ( + 2) + (-1) + (-1) + ( + 5) 

= 0 

Since x is computed from a sample, it is called a statistic , whereas /jl, computed 
from a population of data, is called a parameter. 

The properties of the arithmetic mean include the following: 

1. For a given set of data, there is one, and only one, arithmetic mean. 

2. Its meaning is easily understood. 

3. Since every value goes into its computation, it is affected by the magnitude of 
each value. 

Because of this last property, the arithmetic mean may not be the best measure 
of central tendency when one or two extreme values are present in a set of data. 

The Median The median is that value above which half the values lie and below 
which the other half lie. If the number of items is odd, the median is the value 
of the middle item of an ordered array, when the items are arranged in ascending 
(or descending) order of magnitude. If the number of items is even, none of the 
items has an equal number of values above and below it. In this event the median 
is equal to the mean, or average, of the two middle values. 

EXAMPLE 2.6.3 Five households have annual total incomes of $10,000, $24,500, 
$15,000, $21,500, and $13,000. To find the median total income for these five 
households, we first arrange the values in order of magnitude: 

$10,000 13,000 15,000 21,500 24,500 
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The median is the middle value, $15,000. Suppose there had been a sixth value 
of $9,000. The ordered array would have been 

$9000 10,000 13,000 15,000 21,500 24,500 

and the median would have been ($13,000 + $15,000)/2 = $14,000. 

Properties of the median include the following: 

1. The median always exists in a set of numerical data. For a given set of data, 
there is only one median. 

2. The median is not often affected by extreme values, as is the mean. 

3. The median can be used to characterize qualitative data. For example, a product 
might be marketed in three quality categories—good, better, and best—where the 
quality of the product falling in the “better” category is considered “average.” 

4. The median is easy to calculate unless a large number of values are involved. 

The Mode The mode for ungrouped discrete data is the value that occurs most 
frequently. If all the values in a set of data are different, there is no mode. In the 
above family income example, there is no mode because all the values are dif¬ 
ferent. 

EXAMPLE 2.6.4 Here is a set of data that does have a mode. A clerical pool 
consists of 10 employees whose ages are 18, 19, 21, 22, 22, 22, 26, 32, 35, and 
36. The most frequently occurring, or modal, age is 22. Some sets of data may 
have two modes, in which case the data are said to be bimodal. If the ages of the 
employees noted above had been 18, 19, 21,22, 22, 22, 26, 32, 32, and 32, the 
two modes would be 22 and 32. A set of data can have more than two modes, 
but the usefulness of indicating a large number of modes is questionable. 

In symmetrical distributions, the mean and median are identical in value. In 
asymmetrical distributions, these values are not equal. Figure 2.6.2 shows the 
relative positions of the mean, median, and mode for a symmetrical and for some 
asymmetrical distributions. 

Population measures of central tendency are often called location parameters, 
since they “locate” the position of a population’s frequency distribution on the 
horizontal axis. Consider two population distributions with means p A and /jl b such 
that fx A is smaller than p n . If you graph the two populations on the same horizontal 
axis, the population with mean fi A will be located to the left of the population 
with mean ix B . 

Of the three measures of central tendency we have discussed, the mean plays 
the most important role in the type of statistics presented in this text. 


Once we have computed the mean of a set of data, we want to know the extent 
to which the values differ from this mean. We use the term dispersion to describe 
the degree to which a set of values vary about their mean. Other terms that convey 
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FIGURE 2.6.2 
Locations of mean, 
median, and mode 
for different 
distributions 
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this same concept are variation, scatter, and spread. When the values in a sample 
or population are all close to the mean, they exhibit less dispersion than when 
some of the values are much larger and/or much smaller than the mean. Figure 
2.6.3 shows frequency polygons for two frequency distributions. They both have 
the same mean, but they differ with respect to variability. Four descriptive measures 
used to express the amount of dispersion present in a set of data are the range, 
the average deviation, the variance, and the standard deviation. 
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FIGURE 2.6.3 
Two frequency 
distributions with 
equal means but 
different amounts 
of dispersion 



The Range The range, as noted earlier, is defined as the difference between the 
largest and the smallest value in a set of data. 

EXAMPLE 2.6.5 Ten typists applying for a job with a bank made the following 
scores on a typing speed test. To get an idea of the variation in typing speeds, 
we compute the range by subtracting the smallest score from the largest score. 
That is, for these data, Range = 89 - 54 = 35. 


Applicant 1 2 3 4 5 6 7 8 9 ' 10 

Speed (words/min) 54 55 79 70 86 81 75 89 72 68 


The range is easy to compute. However, it’s usually an unsatisfactory measure 
of dispersion, since only two values in a set of data are used in computing it. In 
other words, the range does not use all the information available in the data it is 
supposed to describe. 

The Average Deviation The average deviation expresses the average amount by 
which the values in a sample or population differ from their mean. When computed 
from a sample, the average deviation takes into account the deviation of each 
value from the mean, jc,- - x. However, the sum of these deviations, and hence 
their mean, is always equal to 0, as shown in Example 2.6.2. Therefore, if it is 
to lead to a valuable measure of dispersion, we must modify the procedure. An 
appropriate modification is to take the mean of the deviations while ignoring the 





signs. That is, we add the absolute values of the deviations and divide by n to 
obtain the average deviation. The procedure is expressed in the following formula: 

A , • • 2 |* f . - T| 

Average deviation = - ( 2 . 6 . 6 ) 

n 


EXAMPLE 2.6.6 Let us use the data of Example 2.6.5 to show how to compute the 
average deviation. Since the mean of these data is 72.9, we have 

A . . . |54 — 72.9| + |55 - 72.9| + • • • + |68 - 72.91 

Average deviation = J - ! - 1 - 1 - 5 - 1 

6 10 

18.9 + 17.9 + • • • + 4.9 


Thus we can say that, on the average, the values differ from their mean by 9.1 
words per minute. 


The average deviation is an intuitively satisfying measure of dispersion. But its 
usefulness is limited because it does not lend itself to further mathematical ma¬ 
nipulation. Consequently it is seldom used as a measure of dispersion. 


The Variance The variance , like the average deviation, uses all the deviations 
of values from their mean, that is, x t — x. In computing the variance, however, 
we avoid negative differences by squaring, rather than by taking absolute values. 
We may compute the variance of a sample of data, then, from the formula 


Sample variance = 


S(*/ ~ x ) 2 
n 


(2.6.7) 


Thus the variance is also a kind of average. It is the average of the squares of the 
deviations of the individual values from their mean. The numerator of Equation 
2.6.7 is called the sum of squares about the mean. Note: For any set of values, 
the sum of squared deviations from the mean is smaller than the sum of squared 
deviations from any other point. 

The sample variance s 2 has two functions in statistical analysis. First, it is used 
as a measure of the dispersion present in the sample. Second, it is used to estimate 
the variance of the population from which the sample was drawn. As a measure 
of the dispersion present in a sample, the variance computed by Equation 2.6.7 
is perfectly adequate. However, when we use the sample variance as an estimate 
of the population variance, it is better to divide the sum of squares about the mean 
by n - 1 rather than n. (Chapter 6 will discuss this subject more fully.) Since 
the main object of computing a sample variance is usually to estimate the popu¬ 
lation variance, the following formula is almost always used in defining the sample 
variance: 


St*,- ~ xf 


n - 1 


( 2 . 6 . 8 ) 


The Standard Deviation The variance is expressed in square units. Suppose that 
the data are measured in feet. The variance is expressed as feet squared. In 
statistical analysis you often want to have a measure of dispersion expressed in 
the same units as the original observations. We obtain such a measure, called the 
standard deviation, by taking the positive square root of the variance. That is, 
the standard deviation is equal to 



EXAMPLE 2.6.7 Let us compute the variance and standard deviation for the typing 
speed scores given in Example 2.6.5. 



.y = V138.77 = 11.8 


These formulas for the standard deviation and variance are known as definitional 
or conceptual formulas because they are literal representations of the definitions 
and concepts involved. Learning these formulas helps convey the concepts. 

When there is a large number of values involved in the computations, using 
the definitional formulas without a computer may be tedious. There are alternative, 
less cumbersome formulas that we may use, called computational formulas, that 
yield exactly the same results as the definitional formulas. These are not approx¬ 
imations, but shortcut formulas algebraically derived from the definitional for¬ 
mulas. Their purpose is to lighten your burden, especially when you’re making 
computations on a desk calculator or by hand. 

The shortcut formula for the variance is 



EXAMPLE 2.6.8 Let us compute the variance for the data of Example 2.6.5 by 
Equation 2.6.10, as follows: 


2 _ 10(54 2 + 55 2 + • • • + 68 2 ) - (54 + 55 + • • • + 68) 2 

(10)(9) 


138.77 


and the standard deviation, as before, is 
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Population Variance and Standard Deviation The variance and standard deviation 
of a population are designated, respectively, by the symbols o 2 and cr . We com¬ 
pute the variance o 2 of a finite population as follows: 


o 2 - 


gU/ ~ ^) 2 

N 



N 


N^xf - (Ex ,) 2 
N • N 


( 2 , 6 . 11 ) 


We find the standard deviation of the population by taking the positive square 
root of o 2 . Suppose, for example, that we have a population of size N = 4 
consisting of the values 10, 4, 3, and 7. By Equation 2.6.11, we find that 

4(10 2 + 4 2 + 3 2 + 7 2 ) - (10 + 4 + 3 + l) 2 . n c 

or — -777-*- = 7.5 

4(4) 

The standard deviation, then, is cr = Vt3 — 2.74. 


We said earlier that, when all the values in a set of data are located near their 
mean, they exhibit a small amount of variation or dispersion. And those sets of 
data in which some values are located far from their mean have a large amount 
of dispersion. Expressing these relationships in terms of the standard deviation, 
which measures dispersion, we can say that when the values of a set of data are 
concentrated near their mean, the standard deviation is small. And when the values 
of a set of data are scattered widely about the mean, the standard deviation is 
large. If the standard deviation computed from a set of data is small, the values 
are concentrated near the mean. And if the standard deviation is large, the values 
from which it is computed are dispersed widely about their mean. 

A useful rule that illustrates the relationship between dispersion and standard 
deviation is given by Chebyshev's theorem, named after the Russian mathemati¬ 
cian P. L. Chebyshev (1821-1894). This theorem enables us to calculate for any 
set of data (either samples or populations) the minimum proportion of values that 
can be expected to lie within a specified number of standard deviations of the 
mean. The theorem tells us that at least 75% of the values in a set of data can be 
expected to fall within two standard deviations of the mean, at least 88.9% within 
three standard deviations of the mean, and at least 96% within five standard 
deviations of the mean. Chebyshev’s theorem may be stated in general terms as 
follows: 

Given a set of n observations x p x 2/ x 3 , ..., x n , at least (1 - 1 /k 2 ) of the 
observations will fall within k (where k > 1) standard deviations of the mean 
of the set of observations. 

Chebyshev’s theorem is applicable to any set of observations, so we can use it 
for either samples or populations. Let us now see how we can apply it in practice. 

Suppose that a set of data has a mean of 150 and a standard deviation of 25. 
We can say that we can expect at least 75% of the values to be between 100 and 
200, at least 88.9% to be between 75 and 225, and at least 96% to be between 



25 and 275. Suppose that another set of data has the same mean, 150, but a 
standard deviation of 10. Applying Chebyshev’s theorem, for this set of data we 
can expect at least 75% of the values to be between 130 and 170, at least 88.9% 
between 120 and 180, and at least 96% to be between 100 and 200. Thus the 
intervals computed for the latter set of data are all narrower than those for the 
former. Therefore we see that for a set of data with a small standard deviation, a 
larger proportion of values will be concentrated near the mean than for a set of 
data with a large standard deviation. We will discuss Chebyshev’s theorem again 
later in the text. Additional bounds, and the corresponding proportion of values 
lying between them, have been tabulated and graphed by Hausner (1977), page 
220 . 

2.6.1 An office supply company has a fleet of 100 trucks that it uses for making local 
deliveries. During a recent month the number of miles each truck in a sample of 10 was 
driven was as follows. Compute the following descriptive measures: (a) the mean, (b) 
the median, (c) the mode, (d) the range, (e) the variance, (f) the standard deviation. 

Truck number 1 2345678 9 10 

Mites driven (x 100) 23 34 20 18 30 30 30 38 25 27 

2.6.2 The following are the amounts for food and lodging claimed on the expense accounts 
of a sample of 12 salespersons for the same day. For these data, compute: (a) the mean, 
(b) the median, (c) the mode, (d) the range, (e) the variance, (f) the standard devia¬ 
tion, (g) the average deviation. 

Salesperson 1 2 34 5 6 7 8 9 10 11 12 

Amount, $ 55 84 63 57 52 70 56 68 74 66 68 64 

2.6.3 The following are the prices (in thousands of dollars) of 15 condominiums in a 

sample selected from those in a new complex: 59, 52, 54, 56, 62, 62, 56, 56, 58, 55, 60, 

54, 59, 55, 59. For these data, find: (a) the mean, (b) the median, (c) the mode, (d) 
the range, (e) the variance, (f) the standard deviation. 

2.6.4 The following are the number of miles between home and office of a sample of 10 
people who work for the same firm: 3, 16, 12, 11, 14, 5, 7, 14, 9, 8. For these data, 
find: (a) the mean, (b) the median, (c) the mode, (d) the range, (e) the variance, 
(f) the standard deviation. 

2.6.5 A grocer has determined that the mean daily sales of eggs is 100 dozen, with a 
standard deviation of 10. (a) What minimum percentage of the time can the grocer expect 
to sell between 80 and 120 dozen per day? (b) Between what two bounds can the grocer 
expect daily sales to lie at least 96% of the time? 


2.7 DESCRIPTIVE MEASURES COMPUTED FROM GROUPED DATA 

One sometimes needs to compute the various descriptive measures from data that 
have been grouped into class intervals and presented as a frequency distribution 
such as the one shown in Table 2.4.1. If the data consist of a large number of 





values, and if the computations have to be made by hand or on a calculator, we 
can save ourselves a great deal of labor by grouping the data before we compute 
the descriptive measures. If you have access to a computer, however, having a 
large number of values to analyze poses no particular problem. You can usually 
enter the raw data into the computer with little inconvenience. 

Sometimes original data are inaccessible, but a frequency distribution based on 
the data is available in some published source, such as an annual report. Then, if 
you need descriptive measures, use the techniques given in this section. 

When data are grouped into class intervals, each observation loses its identity. 
We can determine the number of observations falling in each of the various class 
intervals from a frequency distribution, but we cannot determine the actual values 
associated with the observations. For this reason, when we compute descriptive 
measures from grouped data, we must make certain assumptions regarding the 
data. As a consequence of making these assumptions, we must regard the values 
of the descriptive measures computed in this manner as approximations to the true 
values. We will indicate the assumptions that have to be made as we consider 
each measure. 

The Mean When we compute the mean from grouped data, we make the assumption that 

each observation falling within a given class interval is equal to the value of the 
midpoint of that interval. The midpoint of a class interval is called the class mark. 
We obtain the class mark by adding the respective class limits and dividing by 2. 
Consider the data on employee ages in Table 2.4.1. To calculate the mean for 
this frequency distribution, we assume that the 2 observations in the first class 
1 interval are both equal to 17, the 10 observations in the second class interval are 

equal to 22, and so on. Of course, for a given frequency distribution, it is unlikely 
that all the observations in the class intervals have values actually equal to the 
class marks. We make this assumption with the hope that the errors that it intro¬ 
duces will average out. Experience has shown that the assumption is generally 
satisfactory, as are the assumptions made about the other descriptive measures 
computed from grouped data. 

Since each observation takes on the value of the class mark of the interval in 
which it falls, we compute the mean by multiplying each class mark by its cor¬ 
responding frequency. Then we add the resulting products, and divide the total 
by the number of observations. We may express the procedure for sample data 
by 

s - 

n 

where k = the number of class intervals, x, = the class mark of the ith class 
interval, and — the frequency of the ith class interval. 

Note that Equation 2.7.1 resembles Equation 2.6.2, the formula for computing 
the mean from ungrouped data. The numerator of Equation 2.7.1 illustrates an 
alternative way of finding the sum of a set of numbers when some of them are 
duplicated. For example, suppose we have the numbers 2, 2, 2, 3, 3, 6, 6, 6, 6. 
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TABLE 2.7.1 
Intermediate 
calculations for 
computing 
descriptive 
measures for 
the frequency 
distribution of 
Table 2.4.1 


We can find the sum of these numbers by simple addition: 

2 + 2 + 2 + 3 + 3 + 6 + 6 + 6 + 6 — 36 

This is the procedure followed in obtaining the numerator of Equation 2.6.2. 
Alternatively, we can find the sum as follows: 

3(2) + 2(3) + 4(6) = 6 + 6 + 24 = 36 

This is the procedure followed in computing the numerator of Equation 2.7.1. 

The mean computed by Equation 2.7.1 is an example of a weighted mean. It 
is a mean of the class marks in which each is weighted by the frequency with 
which it is represented in the frequency distribution. 


We make the same assumption regarding the values assumed by the observations 
when we compute the variance and standard deviation from grouped data. Con¬ 
sequently, the definitional or conceptual formula for the sample variance is 


s 2 = 


2 -=, C*/ - x) 2 f l 


n - 1 


(2.7.2) 


and the computational formula is 


s = 


I xffi - I*//) 2 

n(n — 1) 


(2.7.3) 


We find the standard deviation by taking the square root of s 2 . 

We may use the data of Table 2.4.1 to show how to compute the mean, vari¬ 
ance, and standard deviation. Table 2.7.1 gives the necessary intermediate cal¬ 
culations. For the mean, we have 


3480 

100 


34.8 


Class interval 

Class mark (x,) 

Frequency (/,) 

Xj fj 

X]fj 

15-19 

17 

2 

34 

578 

20-24 

22 

10 

220 

4,840 

25-29 

27 

19 

513 

13,851 

30-34 

32 

27 

864 

27,648 

35-39 

37 

16 

592 

21,904 

40-44 

42 

10 

420 

17,640 

45-49 

47 

6 

282 

13,254 

50-54 

52 

5 

260 

13,520 

55-59 

57 

3 

171 

9,747 

60-64 

62 

2 

124 

7,688 

Total 


100 

3,480 

130,670 







For the variance, we have 


The Median 


FIGURE 2.7.1 
Histogram showing 
median 


s 


,2 


100(130,670) - (3480) 2 
(100) (99) 


96.63 


and for the standard deviation, s = V96.63 = 9.8 


The formulas in this section have implied that the values used in the calculations 
are those of a sample. To convert these formulas to formulas for computing the 
corresponding descriptive measures from a finite population, substitute jjl for x, 
a 2 for s 2 , and N for n and n - 1 wherever they appear in Equations 2.7.1 through 
2.7.3. 


We may also compute the median from grouped data. We defined this measure 
of central tendency earlier as the value, in a set of data, above and below which 
half the values lie. This definition holds when the data are in the form of a 
frequency distribution. But since the individual values in a frequency distribution 
are not identifiable, we cannot find the exact value of the median. 

The median for a frequency distribution is that value, or point, on the hori¬ 
zontal axis of the histogram of the distribution at which a perpendicular line 
divides the area of the histogram into two equal parts. 

For this definition to be valid, we must assume that the values in each class interval 
are evenly distributed over the entire interval. Figure 2.7.1 shows the location of 
the median of a set of data represented by a histogram. 

The first step in computing the median for a frequency distribution is to deter¬ 
mine the class interval in which it is located. We do this by finding which interval 
contains the n/2 value. For the employee-ages example, n/2 = 50. Table 2.4.2, 
the cumulative frequency distribution, shows that the fiftieth value is located in 
the fourth class interval. In Table 2.4.2, 31 values are less than 29.5, the true 
upper limit of the third interval. Thus that point is only 50-31 = 19 values 
away from the median. Assume that the values in the fourth class interval are 




The Mode 


Percentiles and 
Quartiles 


evenly distributed throughout the interval. Then, since there are 27 values in the 
fourth interval, we can reason that the median is that value which is 19/27ths of 
the way into the fourth interval. To obtain the value of the median, we multiply 
19/27 by 5, the width of the class interval, and add the product to 29.5, the true 
lower limit of the fourth class interval. We have, then, for the employee-ages 
data, 


Median = 29.5 + — (5) = 33.0 
27 


In general, we find the median of a set of data, either a sample or a population, 
from 


where L 

j 

f 

W 


Median = L + y W (2.7.4) 

the true lower limit of the class interval in which the median is 
located 

the number of values still needed to reach the median, after the 
lower limit of the interval containing the median has been reached 
the frequency in the class interval containing the median 
the width of the class interval 


When we want to find the mode of a frequency distribution, we usually just specify 
the modal class , which is defined as the class interval containing the largest 
number of values . The modal class for the employee-ages example is the fourth 
interval, whose limits are 30 and 34. Figure 2.7.2 shows histograms for three sets 
of data: one that has no mode, one that has one mode, and one that has two 
modes. 


The mean and median are special cases of a family of parameters called location 
parameters. These parameters “locate” a distribution on the horizontal axis by 
designating certain positions in terms of the variable assigned to that axis when 
the distribution is graphed. For example, a distribution with a median of 50 is 
located to the right of a distribution with a median of 25 when the two distributions 
are graphed. Other location parameters include percentiles and quartiles. We 
define a percentile as follows: 

Given a set of observations x v x 2 , ..., x ni the pth percentile P is the value of 
X such that p percent of the observations are less than P and (100 - p) percent 
are greater than P. 

To distinguish one of the 99 possible percentiles from another, we use appro¬ 
priate subscripts on P. For example, the tenth percentile is P 10 , the sixty-fifth 
percentile is P 65 , and so on. The median is the fiftieth percentile: P 50 . 

Suppose that we wish to find the sixtieth percentile of the distribution given in 
Table 2.4.1. Since 60% of 100 (the sample size) is 60, the sixtieth observation, 
when the observations are ordered, is P 60 , the sixtieth percentile. When we consult 









FIGURE 2.7.2 
Histograms that 
differ with respect 
to number of 
modes 


Shortcut Formulas 
for Grouped Data 



the cumulative frequency distribution (Table 2.4.2), we note that the sixtieth 
observation is in the class interval 35-39. We assume that the values falling in 
an interval are uniformly distributed over the interval. To find the sixtieth per¬ 
centile, we use the procedure for computing the median: 

P w = 34.5 + 60 ~ 58 (5) = 35.125 
10 

We say that 60% of the observations are below and 40% are above 35.125. 

The twenty-fifth percentile is often called the first quartile Q { . The fiftieth 
percentile (the median) is often called the second or middle quartile Q 2 , and the 
seventy-fifth is called the third quartile Q 3 . 

Computing the mean and variance from grouped data using the formulas already 
presented can be time-consuming. Fortunately, shortcut formulas are available. 
These shortcut formulas employ a coding technique. To use them, we must trans¬ 
form the scale of measurement, or horizontal scale, in such a way that the distance 
between any two class marks is one unit long. The new scale, therefore, can be 
called the unit scale , or u scale. 

The first step in making the transformation is to select one of the class marks 
and equate it with zero in the new scale. Subtract this class mark from each of 
the other class marks. Then divide the difference by the width of the class interval. 
To make the distance between class marks one unit, the class marks to the left of 
the midpoint will be - 1, -2, -3, and so on. Similarly, the class marks to the 
right of the one transformed to zero will be +1, +2, +3, and so on. You can 
select any class mark as the one to be set equal to zero. However, some are better 



for this purpose than others. A wise choice of class mark results in less labor 
when you compute the mean or variance. Try to select one that is near the center 
of the distribution and that is the midpoint of a class that has one of the larger 
frequencies. 

For the employee ages shown in Table 2.4.1, the class mark of the fourth 
interval, 32, seems a good choice for the one to be set equal to 0. You obtain the 
u scale by subtracting 32 from each class mark and dividing the signed difference 
by the width of the class interval, 5. (Note that 5 is also the distance between any 
two adjacent class marks.) For example, to convert the third class mark to its 
corresponding value on the u scale, we proceed as follows: 

27 - 32 -5 ^ { 

5 5 

In general, we may transform any X value in the original scale to its corre¬ 
sponding value in the u scale by the formula 


where x,- = the value of X for which a u value is desired, x 0 = the class mark 
selected to be zero in the u scale, and W = the class-interval width. 

Figure 2.7.3 shows the original scale and the u scale for the employee-age data. 
Using the new class marks, we compute a mean u in units of the u scale. Values 
of a, replace values of x,- in the formula given earlier. Thus: 



n 


(2.7.6) 


We obtain the mean x expressed in original units from 

X = TiW + X 0 (2.7.7) 


This is a rearrangement of Equation 2.7.5, in which x, and w, are replaced by x 
and Ti, respectively. The conceptual and computational formulas for the variance 
in terms of u are, respectively, 


FIGURE 2.7.3 
Original scale and 
transformed scale 
for data in Table 
2.4.1 



Original scale 
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Class marks 


Transformed scale 


Class marks 




(2.7.8) 


Exercises 


TABLE 2.7.2 
Intermediate 
calculations for 
computing the 
mean and variance 
from coded data, 
employee ages 


si = 


1 (“.■ - «)7i 

n - 1 


and 


, «?/. - (2* = | u,0 


" n(n - 1) 

To obtain the variance in terms of the original scale, we use the formula 


(2.7.9) 


s 2 = sl • W 2 


(2.7.10) 


We can illustrate these procedures by applying them to the employee-age data. 
Table 2.7.2 shows the necessary intermediate calculations. By Equation 2.7.6, 
the mean is 



0.56 


and, by Equation 2.7.7, 


x = (0.56)(5) + 32 = 34.8 


The variance, by Equation 2.7.9, is 

2 _ 100(414) - (56) 2 
S “ ~ (100)(99) 


3.86505 


and, by Equation 2.7.10, 

s 2 = 3.86505(5) 2 = 96.63 


These results agree with those previously obtained. 

These formulas for computing the mean and variance by using coded data are 
those for the sample mean and variance. To convert them to formulas for the 
mean and variance of a finite population, we merely insert N for n, N for n — 1, 
Ijl for x, and a 2 for s 2 where appropriate. 


2.7.1 Refer to Exercise 2.5.1. Treat the data as a sample and compute the mean, median, 
variance, and standard deviation by the methods of this section. 


Xj 


f. 

Ujfi 

Ujfi 

17 

-3 

2 

- 6 

18 

22 

-2 

10 

-20 

40 

27 

-1 

19 

-19 

19 

32 

0 

27 

0 

0 

37 

1 

16 

16 

16 

42 

2 

10 

20 

40 

47 

3 

6 

18 

54 

52 

4 

5 

20 

80 

57 

5 

3 

15 

75 

62 

6 

2 

12 

72 

Total 


100 

56 

414 





2.7.2 Repeat Exercise 2.7.1 using the data of Exercise 2.5.2. 

2.7.3 Repeat Exercise 2.7.1 using the data of Exercise 2.5.3. 

2.7.4 The following are the monthly salaries (x 10) quoted on the last 100 listings for 
junior executive positions filed with an employment agency, (a) Prepare an ordered array, 
a frequency distribution, a histogram, and a frequency polygon, (b) Treat these data as a 
sample and compute the mean, median, variance, and standard deviation. ( Suggestion: 
Use intervals of size 10 beginning with 110 and transform to the u scale, as illustrated in 
this section.) 


119 

183 

147 

148 

143 

153 

163 

169 

149 

153 


126 

191 

143 

156 

145 

151 

163 

166 

135 

152 


139 

200 

133 

151 

143 

161 

157 

169 

137 

143 

jxr: 

130 

202 

136 

161 

143 

173 

152 

162 

142 

143 

;;; 

142 

190 

123 

191 

124 

176 

142 

155 

140 

137 

r.™ 

140 

180 

117 

184 

134 

184 

143 

143 

150 

144 


158 

171 

122 

175 

130 

187 

137 

146 

156 

139 


159 

164 

139 

160 

140 

172 

132 

143 

154 

145 


162 

157 

139 

151 

148 

164 

145 

144 

154 

148 


170 

152 

149 

150 

155 

151 

147 

148 

154 

153 



2.7.5 The following is the distribution of commissions earned during a week by 160 
salespersons, (a) Compute the mean, median, variance, and standard deviation for these 
data, (b) Determine the first and third quartiles and the ninety-fifth percentile. 

Class 

interval, $ 100-149 150-199 200-249 250-299 300-349 350-399 400-449 450-499 

Frequency 19 25 30 25 20 17 14 10 

Summary This chapter was concerned with techniques for organizing and summarizing data. 

People who wish to understand the true nature of their data and to communicate 
to others the information they contain must be able to use these techniques. 

You learned in this chapter that making an ordered array may be a good first 
step toward summarizing data. Frequency distributions, relative frequency distri¬ 
butions, and cumulative distributions make possible further organization and sum¬ 
marization of data. We can effectively communicate the information contained in 
large sets of data by using graphic procedures such as histograms, frequency 
polygons, and ogives. 

You learned how to compute several descriptive measures that provide useful 
summary information about sets of data. The two broad categories of descriptive 
measures covered here are measures of central tendency (or measures of location) 
and measures of dispersion. You learned to compute and understand the meaning 
of the mean , the median, and the mode as measures of central tendency. The most 
important measures of dispersion you learned about in this chapter are the variance 
and the standard deviation. You learned to compute these descriptive measures 
using both grouped and ungrouped data. 

Review Questions 1 . Explain the difference between descriptive statistics and inferential statistics. 

2. Define the following terms: 









(a) 

variable 

(m) 

statistic 

(b) 

random variable 

(n) 

parameter 

(c) 

quantitative variable 

(o) 

random sample 

(d) 

qualitative variable 

(P) 

ogive 

(e) 

discrete variable 

(q) 

relative frequency distribution 

(f) 

continuous variable 

(r) 

percentile 

(8) 

population 

(s) 

quartile 

(h) 

sample 

(t) 

Sturges’ rule 

(i) 

ordered array 

(U) 

Chebyshev’s theorem 

(j) 

frequency distribution 

(V) 

class-interval width 

(k) 

histogram 

(W) 

class mark 

(1) 

frequency polygon 

(X) 

class limits 


3. Compare and contrast the mean, median, and mode as measures of central tendency. 

4. What is meant by the term dispersion ? 

5. Discuss the range, the average deviation, and the variance as measures of dispersion. 

6. Explain how Chebyshev’s theorem can be used to answer questions about a set of data. 

7. Give two reasons why it is useful to be able to compute descriptive measures from 
grouped data. 

8. State the assumptions made in computing each of the following descriptive measures 
from grouped data: (a) the mean, (b) the mode, (c) the variance, (d) the median. 

9. Two regionally distributed general-interest magazines solicit advertising from the same 

clientele. The advertising rates and the total number of subscribers are the same for both 
magazines. The following table shows the age distributions of subscribers to the tv/o 
magazines, in thousands of subscribers, (a) Construct a histogram for each set of data, (b) 
Prepare the relative frequency distribution for each set of data, (c) Prepare the cumulative 
relative frequency distribution for each set of data, (d) Suppose that you are a manufac¬ 
turer of baby food. In which magazine would you advertise? Why ? (e) Suppose that you 

are a real estate agent for a retirement community. In which magazine would you advertise? 
Why? 


Age (years) 

10-19 

20-29 

30-39 

40-49 

50-59 

60-69 

70-79 

Magazine A 

10 

48 

58 

38 

28 

10 

8 

Magazine B 

8 

18 

18 

28 

54 

46 

28 


10. A contest sponsored by a radio station drew 10,000 responses. The following table 
shows the age distribution of the contestants, (a) Construct a histogram from these data, 
(b) What is the approximate mean age of the contestants? (c) What proportion of the 
contestants are between 20 and 39 years of age, inclusive? (d) What proportion of the 
contestants are under 30 years of age? (e) What proportion arc 40 or over? 


Age (years) 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 

Contestants 4 8 15 19 21 12 8 6 4 3 

(x 100) 


11. A market-research firm conducts a household survey in a certain community. One 
question asked is ‘‘Number of rooms per dwelling unit.” The following table shows the 
results. Compute the mean, median, variance, and standard deviation. 
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5 
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2 

7 

3 

5 
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4 4 1 

5 2 7 

6 6 4 

2 5 8 

5 6 5 


6 3 

5 4 

3 6 

6 6 

4 3 


6 

6 

5 

3 

4 


6 

8 

5 

7 

3 


6 

4 

6 

7 

6 


7 6 
5 7 
7 6 
7 5 
5 4 


12. The weights, in micrograms, of 20 one-inch specimens of a certain synthetic fiber 
randomly selected from a day’s production of a factory are as follows: 3.7, 3.1, 2.0, 2.8, 

2.3, 4.5, 3.6, 3.0, 3.0, 2.3, 3.1, 2.6, 3.4, 4.8, 3.8, 4.2, 4.5, 3.5, 3.1, 4.6. Compute the 
measures that would be appropriate for describing these data. 

13. In a survey of small bakeries in the Southeast, 10 bakeries report the following num z 

bers of employees: 15, 14, 12, 19, 13, 14, 15, 18, 13, 19. Find the mean, median, 

variance, and standard deviation. 

14. A sample of the records of an appliance dealer reveals the following ages (in years) 
of 15 refrigerators at the time of the first service call: 9.1, 2.5, 9.5, 1.1, 7.8, 7.0, 2.2, 

4.4, 8.0, 6.4, 2.9, 7.4, 7.2, 4.3, 5.9. Compute the mean, median, variance, and standard 
deviation. 

15. A random sample of 10 college students reveals the following current balances in their 
bank accounts (in hundreds of dollars): 3.4, 1.8, 1.4, 3.6, 1.8, 3.7, 3.4, 2.9, 4.2, 2.8. 
Compute the mean, median, variance, and standard deviation. 

16. The following are the current prices per share (in dollars) of 20 stocks: 59, 97, 53, 

83, 45, 47, 88, 51, 76, 64, 66, 92, 97, 55, 85, 108, 62, 55, 136, 51. Find the mean, 

median, variance, and standard deviation. 

17. For its quality-control program, a firm that makes spark plugs periodically draws 
samples of size 100 from the assembly line and inspects them. The numbers of defective 
spark plugs found in 25 such samples are as follows: 0, 1, 0, 0, 1,5, 3, 0, 5, 5, 0, 0, 4, 
4, 0, 3, 5, 0, 2, 3, 4, 2, 1,5, 4. Find the mean, median, variance, and standard deviation. 

18. In a survey of urban employees designed to learn more about their eating and drinking 
habits, 15 secretaries reported the number of cups of coffee they drank each day to be as 
follows: 4, 2, 1, 3, 5, 6, 6, 3, 3, 2, 4, 3, 2, 0, 1. Compute the measures that would be 
appropriate for describing these data. 

19. The quality-control division of a light-bulb manufacturer conducted forced-life tests 
on a sample of 25 bulbs. The bulbs’ lifetimes, in thousands of hours, were as follows: 
1.1, 1.1, 1.2, 1.1, 1.4, 0.9, 1.2, 1.2, 1.3, 0.8, 1.2, 1.2, 1.2, 1.7, 1.5, 1.2, 0.8, 1.3, 
0.9, 1.7, 1.3, 1.2, 1.2, 1.4, 1.0. Compute the mean, median, variance, and standard 
deviation. 

20. A sample of 20 car registrations selected by a certain county tax office reveals the 
following ages of cars (to the nearest year): 1, 3, 3, 3, 8, 7, 4, 6, 8, 5, 5, 5, 9, 9, 10, 
10, 4, 2, 4, 2. Compute the mean, median, variance, and standard deviation. 

21. The following are the tensile strengths of 15 specimens of plastic: 37, 46, 45, 31, 32, 
48, 44, 45, 35, 42, 42, 33, 35, 26, 47. The data have been coded for ease of calculation. 
Compute the mean, median, variance, and standard deviation. 

22. A sample of 20 cartons of household fuses reveals the following numbers of defectives 
per carton: 7, 6, 3, 4, 3, 9, 4, 4, 5, 5, 9, 4, 6, 9, 7, 0, 6, 1, 3, 0. Compute the mean, 
median, variance, and standard deviation. 

23. An estimator of timber yield selects a sample of 20 trees from a tract of land and 
estimates the following yields in board feet per tree: 385, 317, 326, 309, 595, 228, 241, 
















582, 411, 418, 305, 463, 482, 208, 503, 386, 329, 251, 193, '368. Compute the mean, 
median, variance, and standard deviation. 

24. A survey of 20 households reveals the following ages of refrigerators: 1, 1, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6, 7, 8, 10. (a) What is the mean age of the 20 refrigerators? 
(b) What is the median age? (c) Compute the variance and standard deviation of the ages. 

25. The following are the times in seconds that a sample of 16 assembly-line employees 
take to perform a certain operation: 5.9, 7.2, 8.8, 7.0, 8.7, 6.3/7.1, 5.1, 10.0, 8.5, 8 9, 
9.3, 6.3, 6.9, 9.8, 6.7. Compute the mean, median, variance, and standard deviation. 

26. The following are the number of days elapsing between date of purchase and date of 
return of the first 110 items returned to a department store during the current fiscal year. 
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18 
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29 
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34 
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Summarize these data in the ways that you think would be appropriate for presentation to 
management. 

27. The following are the annual salaries (in thousands of dollars) of the heads of house¬ 
hold in two communities. 


Community A 


10 

28 

29 

36 

53 

11 

33 

10 

13 

11 

41 

19 

19 

15 

11 

22 

24 

15 

69 

58 

56 

22 

33 

17 

15 

17 

22 

30 

46 

41 

37 

10 

46 

40 

32 

63 

28 

14 

10 

17 

42 

15 

14 

11 

13 

42 

52 

48 

61 

27 

53 

61 

24 

10 

22 

18 

31 

25 

55 

51 

63 

48 

39 

12 

33 

48 

16 

18 

12 

14 

61 

15 

17 

17 

19 

51 

54 

72 

21 

25 

33 

42 

10 

19 

34 

14 

32 

62 

23 

60 

77 

32 

14 

54 

22 

46 

69 

18 

11 

18 

51 

16 

19 

18 

14 


Community B 


75 

69 

94 

47 

60 

56 

28 

14 

62 

61 

25 

59 

32 

41 

58 

57 

36 

31 

39 

56 

18 

60 

18 

76 

53 

29 

71 

49 

98 

49 

71 

50 

31 

73 

76 

20 

54 

67 

73 

68 

45 

69 

74 

73 

76 

42 

31 

70 

16 

29 

26 

50 

14 
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34 
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24 
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35 

60 

72 

72 
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48 

20 

43 

18 

36 

76 

70 

71 

67 

45 

18 

26 

44 

23 

36 

75 

64 

67 

31 

68 

19 

78 

76 

75 

51 

49 

52 

78 

54 

40 

56 


(a) Organize and summarize these data as you would for presentation to the client of a 
market research lirm. 





(b) The client, which specializes in expensive home furnishings, wants to open a new 
retail outlet in one of these communities. Which community would you recommend for 
the new store? Why? 

(c) Is the mean or the median a more appropriate measure of central tendency for the data 
of Community A? Community B? 

(d) Which set of data is more variable? 

28. Refer to the population of employed heads of households given in Appendix II. Use 
descriptive measures to compare the widowed or divorced males and females with respect 
to annual income. 








3. Some Elementary 
Probability Concepts 


Chapter Objectives: This chapter introduces you to the 
basic concepts and techniques of probability. These skills 
provide a mechanism that can help you understand much 
of the variability and complexity in the business world. 
This is the first of three chapters that tie together de¬ 
scriptive and inferential statistics. After studying this 
chapter and working the exercises, you should be able 
to: 

1. Understand the basic concepts of set theory 

2. Determine the number of permutations that can be 
made from n objects taken r at a time 

3. Determine the number of combinations that can be 
made from n objects taken r at a time 

4. Construct a tree diagram to represent the possible 
choices available to a decision-maker, along with the 
associated possible outcomes 

5. Understand the three recognized views of probability 

6. Understand the elementary properties of probability 

7. Compute the probability of an event 

8. Use Bayes' theorem, an important formula that is use¬ 
ful for computing certain probabilities 



3.1 INTRODUCTION 


The next three chapters introduce the basic concepts of statistical inference. We 
can think of these chapters as bridging the gap between descriptive statistics (the 
topic of Chapter 2) and statistical inference. 

The theory of probability is a branch of mathematics concerned with the concept 
and measurement of uncertainty. We cannot cover such a large subject in depth 
in a single chapter. However, since statistical inference is based on probability 
theory, we need a rudimentary understanding of probability in order to understand 
it. The objective of this chapter, then, is to present the basic concepts of probability 
needed for an understanding of statistical inference. We shall introduce the subject 
by discussing probabilities that are based on observed data. 

We obtain observations used in calculating probabilities in a variety of ways. 
The process whereby we obtain these observations is called an experiment. An 
experiment may result in one or more possible outcomes, called events. For ex¬ 
ample, an experiment may consist of determining which brand of detergent a 
homemaker in a certain area prefers. The events may be “prefer Brand A,” 
“prefer Brand B,” and so on. Recording the gasoline consumed by a car during 
a given week is another example. The outcome, or event, in this case is the 
number of gallons of gasoline consumed. 

We may now define probability as follows: 

Probability is a number between 0 and 1 inclusive that measures the likelihood 
of the occurrence of some event. 

The more likely the event, the closer the number is to 1. The more unlikely the 
event, the closer the number is to 0. An event that cannot occur has a probability 
of 0. An event that is certain to occur has a probability of 1. 

Here are some examples of events for which we may be interested in computing 
probabilities: a defective item coming off an assembly line; the purchase of Product 
A; the arrival of a shipment of goods on time; Salesperson A selling more than 
500 items during a given week. 


3.2 SET CONCEPTS AND NOTATION (BASIC NOTIONS) 

In this section we present some ideas and notation from set theory that will help 
you to understand probability and to calculate probabilities. 

Set theory was introduced in the latter part of the nineteenth century by George 
Cantor (1845-1918). It is an important tool that is useful in many branches of 
mathematics, including probability theory. For this reason, we include set concepts 
in this chapter. We will cover only a minimum of the basic concepts. You can 
consult the books by Breuer (1958), Stoll (1979), Christian (1965), and Maher 
(1968) for a more complete treatment of the subject. 

A set is a collection of definite, distinct objects called elements or members of 
the set. 





We shall follow the convention of designating a set by a capital letter, usually 
chosen arbitrarily. We may describe a set either by listing all the elements in the 
set or by describing the type of element of which the set is composed. 

The following are examples of sets for which all members of the set are speci¬ 
fied: 

A — {Salespersons Jones, Smith, Williams, and Adams} 

B = {Products A, B, C, D, and E} 

C = {Johnson, King, Phillips, Brown} 

K — {Atlanta, Chicago, Denver, Seattle, San Diego} 

For each of these sets, we could describe the elements of which it is composed: 

A = {The four top salespersons in a certain company} 

B = {The five products manufactured by a certain company} 

C = {All the secretaries in the shipping department of a certain company} 

K = {The cities in which a certain firm has branch offices} 

Some additional concepts relating to sets are: 

1. A unit set is a set composed of only one element. 

2. A set that contains no elements is called the empty set , or null set. It is 
designated by the symbol 0. 

3. The set of all elements in which there is interest in a given discussion is called 
the universal set. It is designated by the capital letter U. 

4. If every element of A is an element of 5, then A is said to be a subset of B. 
Every set is a subset of itself. 

5. The null set is a subset of every other set, by definition. 

6. Two sets are equal if, and only if, they contain the same elements. 

The following are some useful set operations. Where appropriate, we use a 
device called a Venn diagram to portray the relationships among sets. 

The union of two sets A and B is another set, consisting of elements belonging 
to either A or B or both A and B. The symbol U designates the union of two sets. 

EXAMPLE 3.2.1 In one city, the radio stations that play country music comprise 
set A, where A = {radio stations 1,2,3,4,5}. In the same city, the radio stations 
that play rock music make up set B, where B = {radio stations 2,4,6,7}. We may 
write the union of these two sets A U B = {radio stations 1,2,3,4,5,6,7} = {all 
radio stations in the city that play country music or rock music or both}. The sets 
A and B are conjoint because they have at least one element in common. In this 


FIGURE 3.2.1 


Venn diagram, 
showing union of f 

two conjoint sets 
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FIGURE 3.2.2 
Venn diagram, 
showing union of 
two disjoint sets 


case the common elements are radio stations 2 and 4. Figure 3.2.1 uses Venn 
diagrams to show the three sets. [Note that A U B is the total shaded area in the 
rectangle on the right.] If two sets have no elements in common, they are said to 
be disjoint. 

EXAMPLE 3.2.2 In a certain city, there are 12 AM radio stations. Four have a 
daytime broadcasting power of 1000 watts or less. The remaining 8 have a daytime 
broadcasting power greater than 1000 watts. From this information we may define 
the following two sets: 

A = {All radio stations in the city with a daytime broadcasting power of 1000 
watts or less} 

B — {All radio stations in the city with a daytime broadcasting power greater than 
1000 watts} 

A U B = {All radio stations in the city} 

Here the sets A and B are disjoint, since a radio station cannot belong to both set 
A and set B. Figure 3.2.2 shows the union of two disjoint sets. 

The intersection of two sets A and B is another set, consisting of all elements 
in both A and B. The symbol H designates the intersection of two sets. 


EXAMPLE 3.2.3 In Example 3.2.1, the intersection of sets A and B would be written 
A fl B = {radio stations 2 and 4} = {radio stations that play both country and 
rock music}. In Figure 3.2.1, A fl B is the doubly shaded area that represents the 
overlapping of sets A and B. The intersection of two disjoint sets is the null set, 
as illustrated by Figure 3.2.2. 

If the set A is a subset of the universal set U , the complement of A is another 
subset of U and consists of the elements in U that are not in A. The complement 
of A is designated as A (called “A bar”). Note that A and A are disjoint sets and 
that AU A = U. 


EXAMPLE 3.2.4 Of the 250 workers employed by a certain firm, 150 have been 
with the firm 10 years or longer. Let us designate the 250 employees as the 
universal set U, and the subset of 150 who have been with the firm 10 years or 
longer as the set A. Then the complement of the set A is the set A, consisting of 
the 100 employees who have been with the firm less than 10 years. Figure 3.2.3 
illustrates the complement of a set. 
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FIGURE 3.2.3 
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We often want to identify sets and subsets that are represented by cross-tabulated 
data, as shown in Table 3.2.1. When we indicate the number of elements in a 
set, we enclose the set symbol in parentheses and prefix it by the letter n. For 
example, n(A) = 30 indicates that there are 30 elements in set A. 


EXAMPLE 3.2.5 Table 3.2.1 classifies 300 trucking firms by location of home office 
and type of cargo hauled. Sets A { through A 5 consist of trucking firms in each of 
the geographic regions. Sets B l through B 4 consist of firms that haul various types 
of cargo. We may specify other sets by using the concepts of intersection, union, 
and complement. For example, the set B } n A 4 consists of trucking firms that 
haul household goods, and whose home office is in the Midwest. The number of 
trucking firms in this set, n(B x fl A 4 ), is 35. 

The set B 2 U A 2 consists of firms that haul agricultural products, or are located 
in the Middle Atlantic states, or both. And n(B 2 U A 2 ) = 78 + 58-8 = 128. 
In calculating n(B 2 U A 2 ), we subtract the 8 firms that are both haulers of agri¬ 
cultural products and have home offices in the Middle Atlantic states because this 
subset has been counted twice. That is, these 8 are included in both subtotals 78 
and 58. The complement of A 5 , A 5 , consists of all trucking firms not located in 
the West, and n(A 5 ) = 300 - 58 = 242. 


TABLE 3.2.1 

300 trucking firms 
classified by 
location of home 
office and type of 
cargo hauled 



Home office location 



Type of cargo 

Ay 

New 

England 

A 2 

Middle 

Atlantic 

a 3 

South 

a 4 

Midwest 

A s 

West 

Total 

B •, Household goods 

20 

40 

25 

35 

30 

150 

B 2 Agricultural 

8 

8 

20 

30 

12 

78 


products 

B 3 Building 

7 

4 

4 

5 

10 

30 


materials 

fl 4 General freight 

10 

6 

10 

10 

6 

42 


Total 

45 

58 

59 

80 

58 

300 



Exercises 


3 . 2.1 A firm has 231 employees classified by age and job category as follows. 


Job category 

A , 

<20 

Age category 

a 2 a* 

21-25 26-30 

a 4 

31-35 

A 5 

>35 

Total 


By Clerical 

20 

20 

15 

10 

5 

70 


B 2 Custodial 

3 

6 

3 

2 

1 

15 


B 3 Craftsmen 

15 

30 

35 

20 

10 

110 


£? 4 Salesmen 

1 

5 

10 

5 

2 

23 

....... 

B s Junior executives 

0 

1 

5 

2 

0 

8 

“ 7 . 

B 6 Executives 

0 

0 

2 

2 

1 

5 


Total 

39 

62 

70 

41 

19 

231 

SS£ 

Based on this table, 

explain in words the following sets and give 

; the number of employees 


in each: (a) B { DA 

i. 5 > (b) a 2 n z? 6 , 

(c) b 4 n a 5 , (d) a, 

u b 6 , 

(e) A 3 U A 5i (f) B 2 U B 3 , 


(g) A 4 , (h) (A, U A 2 ) n B 

3> (0 (^3 

u b 4 ) n a 5 . 






How many employees satisfy each of the following conditions? 

(j) The person is neither an executive nor a junior executive. 

(k) The person is both an executive and a junior executive. 

(l) The person is more than 30 years old, and is clerical or custodial. 

(m) The person is a salesperson and/or between 21 and 25 years old, inclusive. 

(n) The person is a craftsman 35 years old or younger. 

(o) The person is a craftsman or a salesperson and is between 21 and 30 years old, 
inclusive. 

(p) The person is clerical or custodial, and is more than 30 years old. 

3.2.2 A salesperson with a three-product line calls on 200 customers over a period of 
time. These 200 customers place orders as follows: 

100 ordered Product A 55 ordered Products A and C 

95 ordered Product B 30 ordered Products B and C 

85 ordered Product C 20 ordered Products A and B and C 

50 ordered Products A and B 

Determine the number of customers who order (a) at least one product, (b) no products, 
(c) exactly one product, (d) exactly two products, (e) exactly three products. Use a Venn 
diagram to illustrate. 

3.2.3 Given two sets: A = {0,1,2,3}, B = {3,4,5}. Find A U B and A H B. 

3.2.4 Given the following sets: 

A = {5,6,7,8,9,10} B = {7,8,9,10,11,12} C = {11,12,13,14,15} 

Find: (a) A U B (b ) A HB (c) A U C (d)ADC (e) (A U B) D C 


3.3 COUNTING TECHNIQUES—PERMUTATIONS AND COMBINATIONS 

Suppose that we are computing the probability of some event, or the probability 
of a combination of events, when the total number of possible events is large. 
We find it convenient to have some method of counting the number of such events. 








Factorials 


Permutations 


Let us now look at some techniques to facilitate the counting of events. These 
techniques are useful for counting the number of events comprising the numerator 
and/or the denominator of a probability. 

Given the positive integer n, the product of all the whole numbers from n 
down through 1 is called n factorial and is written n! 

The following are examples of factorials: 

10! = 10 • 9 • 8 • 7 • 6 • 5 • 4 • 3 • 2 • 1 

5! = 5 • 4 • 3 • 2 • 1 

2 ! == 2*1 

In general, n\ = n(n - 1)(/? - 2){n - 3) • • • 1. By definition, 0! = 1. Note 
that 10! = 10 • 9!, 5! = 5 • 4!, nl = n(n - 1)!. 

By means of factorials, we can find the number of ways objects or persons can 
be arranged in a line. 

EXAMPLE 3.3.1 In a certain company, four secretaries’ desks are arranged in a line 
against a wall. Each secretary can sit at any desk. How many seating arrangements 
are possible? The answer is 4! = 4 • 3 • 2 ♦ 1 =24 ways. 

Using a graphic aid called a tree diagram , we can show the possibilities. Let 
us designate the positions as the first, second, third, and fourth positions, and the 
four secretaries as A, B, C, and D. Figure 3.3.1 is a tree diagram representing 
the possible arrangements. 

A permutation is an ordered arrangement of objects. 

The 24 arrangements of secretaries shown in Figure 3.3.1 are the possible per¬ 
mutations of 4 objects taken 4 at a time. In certain situations there may be more 
objects available than positions to be filled. For example, we may wish to know 
how many permutations are possible if we have 5 objects and wish to take only 
2 at a time. We may think of this problem as one in which we have 2 positions 
to fill and 5 objects from which to make selections to fill them. We can get the 
answer using the following line of reasoning: We can fill the first position in one 
of 5 ways, since initially there are 5 objects from which to select. Once we have 
selected an object to fill the first position, there are 4 remaining objects from 
which we can make a selection to fill the second position. Hence there is a total 
of 20 ways (5 x 4) to fill the two positions. Each of these ways of filling the 
two positions is a permutation, and consequently we may say that there are 20 
permutations of 5 objects taken 2 at a time. 

A key word in the definition of a permutation is the word “ordered.” In de¬ 
termining the number of possible permutations in a given situation, we say that 
“order counts.” By this we mean, for example, that a pair of adjacent positions 
filled with the objects a and b and the same pair of adjacent positions filled with 
the objects in the order b, then a, are two different permutations of the same two 
objects. Rearranging the order of two objects creates a new permutation. 
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FIGURE 3.3.1 
Tree diagram, 
showing possible 
arrangements of 
four objects taken 
four at a time 
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EXAMPLE 3.3.2 The telephone switchboard in the company referred to in Example 
3.3.1 requires two operators whose chairs (positions) are side by side. When the 
telephone operators go to lunch, two of the four secretaries take their places. If 
we make a distinction between the two operators’ positions, in how many ways 
can the four secretaries fill them? 

We can answer this question by determining the number of possible permuta¬ 
tions of 4 things taken 2 at a time. There are 4 secretaries, A, B, C, and D, to 
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fill the first position. Once that position has been filled, there are only 3 secretaries 
to fill the second position. See Figure 3.3.2. 

The tree diagram in Figure 3.3.2 illustrates that there are 4 * 3 = 12 possible 
permutations of four things taken two at a time. Suppose that n is the number of 
distinct objects from which an ordered arrangement is to be derived, and r is the 
number of objects in the arrangement. The number of possible ordered arrange¬ 
ments is the number of permutations of n things taken r at a time. This is written 
symbolically as n P r . In general, 

„P r = n(n — l)(w — 2) * • • (n — r + 1) 

We multiply the right-hand side of Equation 3.3.1 by (n — r)\/(n 
is equivalent to multiplying by 1. We obtain 

„P r = n(n - 1)(/? - 2) • • • (n - r + 1) y- -?- 

(n - r)\ 

n(n — !)(/? — 2) • • • (n - r + 1 )(n — r)! 

(n - r)! 

_ n\ 

(n — r)! 

EXAMPLE 3.3.3 In a stock room, 5 adjacent bins are available for storing 5 different 
items. The stock of each item can be stored satisfactorily in any bin. In how many 
ways can we assign the 5 items to the 5 bins? 


(3.3.1) 

r)\. This 


(3.3.2) 


FIGURE 3.3.2 
Tree diagram, 
showing the 
number of 
permutations of 
four objects taken 
two at a time 
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We get the answer by evaluating 5 P 5 , which is 


5P5 = 


(5 - 5)! 


5 • 4 • 3 • 2 • 1 


Suppose that there are 6 different parts to be stocked, but only 4 bins are 
available. To find the number of possible arrangements, we need to determine the 
number of permutations of 6 things taken 4 at a time, which is 


6 • 5 • 4 • 3 • 2! 


(6 - 4)! 


= 360 


Combinations A combination is an arrangement of objects without regard to order. 

The number of combinations of n things taken r at a time may be written as ("). 

In Figure 3.3.2, the permutations of 4 things taken 2 at a time consist of the 
following 12 arrangements. 


In this list, six of the arrangements are the same as the other six, except for the 
order in which the letters occur. They are: 

AB AC AD BC BD CD 

BA CA DA CB DB DC 

Sometimes we may not need to distinguish between, for example, arrangement 
AB and arrangement BA. We may consider them as the same subset. That is, 
their order does not count. In that case, we refer to the arrangements as combi¬ 
nations . In the case of the 4 secretaries occupying 2 switchboard positions, there 
are 12 permutations, but only 6 combinations. That is, there are two permutations 
for each combination. In general, there are r\ permutations for each combination 
of n things taken r at a time. In other words, there are always r! times as many 
permutations as combinations. We express this symbolically as 


P = r 1 

r r ' • 


When we solve equation 3.3.3 for ("), the result is 


We rewrite the numerator of this last expression as the right-hand side of Equation 
3.3.2 to get the formula for the number of combinations of n things taken r at a 
time: 


r\(n — r)l 











Permutations of 
Objects That Are 
Not All Different 


Now let us use Equation 3.3.4 to obtain the number of combinations of 4 things 
taken 2 at a time. As expected, we get 6: 

U\ 4! 4-3-2! 

\ 2 / 2 ! 2 ! 2 - 1 - 2 ! 

EXAMPLE 3.3.4 A perfume manufacturer who makes 10 fragrances wants to prepare 
a gift package containing 6 fragrances. How many combinations of fragrances are 
available? The answer is 

/lo\ _ 10! _ 10-9-8-7-6! 

V 6/ 6!4! 6!4 -3-2-1 

In our discussion of permutations, we considered the case in which all the objects 
being permuted were different. Sometimes, in the set of objects to be permuted, 
one or more subsets of items are indistinguishable. The problem then is to deter¬ 
mine how many permutations, or distinguishable arrangements, are possible under 
these circumstances. Logic tells us that the number should be smaller than when 
all objects are different. 

EXAMPLE 3.3.5 A cafeteria on a certain day wants to serve two white, one green, 
and two yellow vegetables. How many distinguishable arrangements of these 
vegetables can be made on the serving line if we distinguish between vegetables 
only on the basis of color? The possible color sequences for the 5 vegetables are 
as follows: 


WWYYG 

WYWYG 

YYWWG 

YWYWG 

YWWYG 

WYYWG 

WWYGY 

WYWGY 

YYWGW 

YWYGW 

YWWGY 

WYYGW 

WWGYY 

WYGWY 

YYGWW 

YWGYW 

YWGWY 

WYGYW 

WGWYY 

WGYWY 

YGYWW 

YGWYW 

YGWWY 

WGYYW 

GWWYY 

GWYWY 

GYYWW 

GYWYW 

GYWWY 

GWYYW 


Thus there are 30 possible sequences. If the vegetables had all been different 
colors, there would have been 5 P 5 = 5! = 120 possible color sequences. 

L,et’s say that the two white vegetables are cauliflower and potatoes, and the 
two yellow ones are squash and corn. We can distinguish between them on this 
basis by the following method. We indicate the differences by using subscripts, 
Wj, W 2 , Yj, and Y 2 . We then can take any one of the sequences previously listed 
and obtain three additional sequences by permuting the subscripts. We permute 
two subscripts for white, leaving the yellows unchanged. There are 2! such per¬ 
mutations. We obtain 2! additional sequences by permuting the subscripts of the 
yellows and leaving the whites unchanged. Since there is only one green, we are 
not concerned with its effect on the number of permutations. We simply note that 
there are 1! permutations of the single green. (If, however, there were two dis¬ 
tinguishable greens, we would have to take the resulting 2! possible permutations 
into account.) Let’s use the first sequence, WWYYG, to illustrate. The four 
possible sequences, when we distinguish between the whites and yellows, are 




W 1 W 2 Y 1 Y 2 G WzWjYiYzG W 1 W 2 Y 2 Y l G W^Y^G 

Since we can obtain 3 additional sequences for each of the 30 original sequences, 
there are 30 • 2! • 2! • 1! = 120 sequences when all objects are different. This 
equals 5 P 5 — 120, the result previously obtained. 

Suppose that n P n equals the number of distinguishable sequences that 
can be formed from n objects taken n at a time. Say that n x are of one type, n 2 
are of a second type, . . . , n k are of a kth type, and n = n x + n 2 + • * • + n k . 
We can generalize the above result as follows: 

n ! = •• •«*! 

When we solve for n P ni ,n 2 ,...,n k > we obtain 


p 

n n\,H2,...,nk 


«,!/t 2 ! . 


Now, using our example, we have 


5^2,2,1 


5! 

2 ! 2 ! 1 ! 


This is the number of sequences previously listed. 


EXAMPLE 3.3.6 A developer of a residential subdivision has 5 house styles and 
wants to build on 10 adjacent lots. How many distinguishable arrangements are 
possible if the developer decides to build 2 houses of each style? We get the 
answer by evaluating Equation 3.3.5. That is, 


10^2,2,2,2,2 


10 ! 

2!2!2!2!2! 


113,400 


An important special case of Equation 3.3.5 occurs when there are only two 
types of objects, that is, when r objects are of one type and n - r objects are of 
another type: 


P 

n r,n — r 


r\(n — r)! 


Thus the number of distinct permutations of n things—of which r are of one type 
and n — r are of another type—is equal to the number of combinations of n 
different things taken r at a time. 


Exercises 


3.3.1 Evaluate the following: (a) ,P 2 , (b) 5 P 3 , (c) gP 9t (d) ]0 P 4 , (e) j> 2 , (f) 

( g ) ( 1 7°)’ (h) (a)’* 0 (i)’ (J) ( 4 ) 

3.3.2 An office suite consists of 4 offices located side by side. These are to be occupied 
by 4 junior executives. A, B, C, and D. In how many ways can these executives be 
assigned to the four offices? 









3.3.3 A supervisor has 7 workers available from which to form a 4-member production 
team. How many different teams are possible? 

3.3.4 An advertising artist has 8 photographs from which to choose in designing a full- 
page magazine ad containing 3 photographs. Position on the page is immaterial. How 
many designs using different combinations of photographs are possible? 

3.3.5 The president of a firm that produces 5 different kinds of soap has a sample of each 
displayed in a row on a credenza in her office, (a) In how many different ways can she 
display the 5 products? (b) Suppose that she wants to display only 3 of the products at a 
time (in a row). How many distinguishable arrangements are possible? 

3.3.6 A salesperson has 7 products he wishes to display at a national convention. He can 
display only 4. The order in which he displays the products is immaterial. How many 
displays does he have to choose from? 

3.3.7 An airline has 6 flights it wishes to advertise in one-page ads in a Sunday newspaper. 
It wants to feature a different flight each Sunday for 6 Sundays, (a) How many different 
sequences of ads can the airline run? (b) Suppose that the airline decides at the last minute 
to run only 4 ads. How many sequences of ads can it run? 

3.3.8 A firm has 5 different positions to fill and 15 applicants from which to choose. All 
applicants are equally qualified for all 5 positions. In how many ways can the firm fill the 
positions? 


3.4 DIFFERENT VIEWS OF PROBABILITY 

We can discuss probability from two points of view: the objective and the sub¬ 
jective. Until recently, most statisticians have held the objective point of view. 
We can also classify objective probability into two categories: (1) a priori or 
classical probability, and (2) a posteriori or the relative frequency concept. Clas¬ 
sical probability has its origins in the seventeenth century and in the games of 
chance that were popular at that time. Examples from those games illustrate the 
principles: When a fair coin is tossed, the probability of observing a head is equal 
to one-half. This is also equal to the probability of observing a tail. When a 
perfectly balanced die is rolled, the probability of observing a one equals one- 
sixth. The probability is the same for the other five faces. One can compute the 
probabilities of such events through abstract reasoning. One does not have to rely 
on the results of any experiment. A coin need never be tossed, nor a die rolled, 
in order to be able to calculate these probabilities. We can define probability from 
the classical point of view as follows: 

If some event can occur in N mutually exclusive and equally likely ways, and if 
m of these ways have characteristic E, the probability of the occurrence of E is 
equal to m/N. 

We can write this definition as a formula: 

P(E) = 


m 

N 


(3.4.1) 


Here P(E) is read “the probability of E." 



Mutually Exclusive 
and Equally Likely 
Events 


The Concept of 
Relative Frequency 


Key phrases in this definition of probability are mutually exclusive and equally 
likely. 

Two events are mutually exclusive if they cannot occur simultaneously. Two 
events are equally likely when there is no reason to expect one event rather 
than the other to occur. 

The following example illustrates the classical concept of probability. 


EXAMPLE 3.4.1 A magazine advertised that it would give a prize to 50 persons 
selected at random from those returning a completed entry form enclosed as part 
of an advertising brochure. As of the closing date for receipt of entries, the 
magazine had received 10,000 completed entry forms. What is the probability 
that a given individual who entered the contest will win a prize? 

Using the classical concept of probability expressed by Equation 3.4.1: 


P(Winner) = 


number of prizes to be awarded 
total number of entries 


50 

10,000 


There is another definition of probability, the relative frequency definition. It is 
similar to the classical definition. The relative frequency approach to probability 
depends on the repeatability of some process and the ability to count the number 
of such repetitions along with the number of times that some event of interest 
occurs. From the relative frequency point of view, we can define probability as 
follows. 

Suppose that some process is repeated a large number of times n, and some 
resulting event with characteristic E occurs m times. The relative frequency of 
occurrence of E, m/n, will be approximately equal to the probability of E. 

We can express this definition as a formula: 

P(E) = - (3.4.2) 

n 


EXAMPLE 3.4.2 A firm that makes soft drinks wants to estimate the probability 
that a customer of a certain grocery store will buy one of its products. A total of 
1500 customers are observed over a period of time. Of these, 600 bought one or 
more of the firm’s products. On the basis of this information, what is the best 
estimate of the desired probability? 

By Equation 3.4.2, 


P(E) = 


600 

1500 


= 0.4 


Bear in mind that m/n in Equation 3.4.2 is only an estimate of P(E), the true 
probability of occurrence of event E. 


r m. 
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Subjective 

Probability 


The subjective approach to probability received attention in the early 1950s as a 
result of the work of Savage (1954), who termed the concept personalistic . This 
view holds that probability measures the confidence that a certain individual has 
in the truth of a certain proposition. 

The subjective view of probability does not depend on the repeatability of any 
process. In fact, we can apply this approach to events that can happen only once. 
An example would be the probability that the Los Angeles Rams will win the 
Superbowl game this year. As another example, consider Salesperson A, who 
assesses the probability of winning the company sales contest this year to be 0.9. 
The sales manager assesses this salesperson’s probability of winning to be only 
0.5. The true probability is unknown. We say only that the salesperson is more 
optimistic than the sales manager. This view of probability is enjoying increased 
attention, especially in an area known as Bayesian decision theory (see Chapter 
15). However, it has not been fully embraced by traditionally oriented statisticians. 


3.5 ELEMENTARY PROPERTIES OF PROBABILITY 

The following elementary properties of probability form the basis of the axiomatic 
approach to probability that was formalized by Kolmogorov (1933). From these 
three properties, one can construct a whole system of probability theory by using 
mathematical logic. The three properties are as follows: 

1. Given some process (or experiment) with n possible outcomes (events) E x , E 2 , 

each event E i is assigned a nonnegative number such that 

0 < P(E t ) < 1 (3.5.1) 

Equation 3.5.1 is called the probability of event £,. Simply stated, this property 
says that all events must have a probability that is between 0 and 1, inclusive. 
This is a reasonable requirement, inasmuch as the concept of a negative probability 
has little intuitive appeal. 

2. If all possible events E x , E 2> . . . , E n are mutually exclusive, the sum of their 
probabilities is equal to 1. 

P(E X ) + P(E 2 ) + • • • + P(E„) = 1 (3.5.2) 

This is the property of exhaustiveness. It refers to the fact that the observer of a 
probabilistic experiment must allow for all possible events. When all these events 
are taken together, their total probability is 1. The requirement that the events be 
mutually exclusive specifies that the events E { , £ 2 , • • • , t> e disjoint, that is, 
that they do not overlap. 

3. Given any two mutually exclusive events E, and E jh the probability of the 
occurrence of either E { or E j is equal to the sum of their probabilities. 

P(Ei or Ej) = P(E,) + P(Ej) 


(3.5.3) 



To see the implication of this property more easily, think about what would be 
true if the two events were not mutually exclusive. What if E t and Ej could happen 
at the same time? When we tried to find the probability of the occurrence of either 
or E Jt we would find the problem of an overlap. Then it would not be so easy 
to calculate the probability. Given that E t and E f cannot occur at the same time— 
that they are mutually exclusive—we simply add the individual probabilities to 
find P(E ) or Ej). 

EXAMPLE 3.5.1 A group of employees consists of Employees A, B, and C. Each 
has the same chance (probability) of being selected for promotion. One of them 
is definitely going to be promoted. What is the probability that Employee A will 
be promoted? Employee B? Employee C? 

Each employee has one chance out of three of being promoted. Thus the prob¬ 
abilities are 1/3 for Employee A, 1/3 for Employee B, and 1/3 for Employee C. 
We can see that each of these probabilities is a number between 0 and 1. 

Since only one person is to be promoted, the three possible events are mutually 
exclusive. Therefore their sum is equal to 1. That is, 1/3 + 1/3 + 1/3 = 1. 

The probability that either Employee A or Employee B will be promoted is 
equal to the sum of their individual probabilities, since the events are mutually 
exclusive. Thus the probability that either Employee A or Employee B will be 
promoted is equal to 1/3 + 1/3 = 2/3. 


3.6 CALCULATING THE PROBABILITY OF AN EVENT 

Let us now use the concepts and techniques introduced in the previous sections 
to solve practical problems involving the calculation of specific probabilities. 

We must first distinguish between two types of probability, conditional prob¬ 
ability and unconditional probability . Suppose that all the possible outcomes of 
some experiment constitute the universal set. We compute the probability of the 
occurrence of an event by forming the ratio of the number of favorable outcomes 
to the number of all possible outcomes. This probability is called an unconditional 
probability. The discussion in this section assumes that each outcome is equally 
likely. 

At times the set of all possible outcomes may constitute a subset of the universal 
set. The population of interest may be reduced by some set of conditions not 
applicable to the total population. When we calculate probabilities with a subset 
of the universal set as the denominator, the result is a conditional probability. 

EXAMPLE 3.6.1 Table 3.6.1 shows 10,000 household appliances cross-classified 
by color and style. Suppose that we want to calculate the probability that an 
appliance picked at random will be white. This is an unconditional probability, 
since we have placed no restrictions on the set of all possible outcomes. We 
compute the probability as follows: 











TABLE 3.6.1 
Household 



Color 



appliances 
classified by color 
and style 

Style 

C, (White) 

C 2 (Copper) 

C 3 (Green) 

Total 

Si 

1,400 

450 

900 

2,750 

^2 

1,300 

350 

800 

2,450 


S 3 

900 

700 

750 

2,350 


1,000 

250 

1,200 

2,450 

Total 

4,600 

1,750 

3,650 

10,000 


P(C X ) 


n(C x ) 


4600 

10,000 


0.46 


Note that we calculate the probability by forming the ratio of two numbers. The 
denominator consists of the total number of appliances that could be selected 
n(U). The numerator is the number of appliances with the characteristic of interest 
n(C x ). 


Conditional 

Probability 


Now suppose that we reduce the set of all possible outcomes to S x appliances. 
What is the probability that an appliance picked at random will be white, given 
that it is type 5,? This is a conditional probability. In Table 3.6.1 there are 2750 
members of set S x . Of these, 1400 are white. These 1400 belong to the set C x D 
S x . The probability sought is given by P(C,|5j), where the vertical line between 
C, and S x is read “given.” The entire expression is read “the probability of C, 
given S x .” Thus we have 


P(C X \S X ) = 


n(C i n 5,) 

«(S,) 


1400 

2750 


0.51 


We may compute the conditional probability PCCjSj) in another way. Divide 
both the numerator and the denominator in the preceding equation by n(U), the 
number in the universal set, or the total number of appliances. The result is 


P(C t \S t ) 


n{C , n S,) 
MU) 
MS,) 
MU) 


Here the numerator is the probability that an appliance picked at random from all 
appliances will be both white and style S,. We can write it PiC, fl 5,). The 
denominator is the probability that an appliance picked at random will be style 
5,. We can write it P(S,). We can rewrite the entire expression as 


P(C X \S X ) 


P(C X n S x ) 
P(S X ) 


1400 

10,000 


0.14 

0.275 


0.51 


2750 



The following example further illustrates conditional probability. 

EXAMPLE 3.6.2 A class consists of 30 students, of whom 10 are men and 20 are 
women. Five of the women and none of the men are out-of-state students. Thus 
the events “male” and “out-of-state” are mutually exclusive. 

(a) A student is selected at random from the class. What is the probability that 
the one selected will be an out-of-state student? The answer is 5/30 = 0.17. 

(b) Now suppose that the student selected is a woman. What is the probability 
that she will be an out-of-state student? Men are now no longer of interest, since 
one of the 20 women has been drawn. Since 5 of the women are out-of-state 
students, the probability that the one selected is from out of state is 5/20 = 0.25. 
The occurrence of one event (a woman) increased the probability of the occurrence 
of the second event (an out-of-state student). 

(c) Suppose that the student selected is a man. What is the probability that he is 
an out-of-state student? Now women are eliminated from consideration in the 
calculation of the probability. Since no men are from out of the state, the prob¬ 
ability we seek is 0/10 = 0. This time, the occurrence of the first event decreases 
the probability of the occurrence of the second event. 

Now think about the three probabilities we have just computed. Why are they 
different? The probability computed in part (a) is an unconditional probability . 
No “conditions” were stipulated. The denominator of the probability consisted 
of all 30 students in the class. 

The probabilities computed in parts (b) and (c) are conditional probabilities . 
We computed them under the stipulated “condition” that some preceding event 
had occurred. The denominators for these probabilities are subsets of the original 
30 students. Figure 3.6.1 shows the situation described in this example. 

The following is a general definition of conditional probability: 

The conditional probability of A given B is equal to the probability A n B 
divided by the probability of B, provided that the probability of B is not 0. 

FIGURE 3.6.1 
Situation for 
calculation of 
conditional 
probabilities 
described in 
Example 3.6.2 






That is, 


Marginal 

Probability 


The Addition Rule 


p m = p(b) * o 


To illustrate the concept of marginal probability, let us again refer to Example 
3.6.1. When we ask for the probability that an appliance in Table 3.6.1 is style 
S x , we are asking for a marginal probability. Interest centers on a probability 
associated with a marginal total. We disregard any other criterion of classification. 
When we compute 


P{S X ) 


2750 

10,000 


0.275 


this implies that we are not interested in the colors. Similarly, if we are interested 
in the probability that an appliance picked at random is white, we use the marginal 
total, 4600, and ignore the style classification. This suggests the following defi¬ 
nition: 


When we ignore one or more criteria of classification in computing a proba¬ 
bility, the resulting probability is a marginal probability. 


The probability of the occurrence of either one or the other of two mutually 
exclusive events is equal to the sum of their individual probabilities. When two 
events are not mutually exclusive, we use the addition rule, which may be stated 
as follows: 

Given two events A and B, the probability that event A or event B or both 
occur is equal to the probability that event A occurs, plus the probability that 
event B occurs, minus the probability that both events occur. 

We can write this as 

P(A U B) = P(A) + P(B) - P(A 0 B) ( 3 . 6 , 2 ) 

Refer to Table 3.6.1 again. We compute the probability that an appliance picked 
at random will be either style S x or white or both. Using Equation 3.6.2, v/e 
obtain 


P(C X U 5,) = P(C,) + P(S X ) - P(C, n 5,) 


4600 2750 

10,000 + 10,000 


5950 

10,000 


0.595 


1400 _ 4600 + 2750 - 1400 

10,000 ~ 10,000 


The 1400 appliances that are both white and style S x are included in the 2750 that 
are style S, as well as in the 4600 that are white. Since, in computing this prob¬ 
ability, we have added these 1400 into the numerator twice, we must subtract 
them once to overcome the effect of duplication, or overlapping. 




The Multiplication 
Rule 


Independent 

Events 


Another useful rule for computing the probability of an event is the multiplication 
rule. This rule is suggested by the definition of conditional probability. Recall 
that we compute the conditional probability of A given B from 


P{A\B) = 


p(A n B) 
P(B ) 


P(B) # 0 


We may rewrite this equation to obtain 

P{A ft B) = P(B)P(A\B) (3.6.3) 

This is the symbolic statement of the multiplication rule. In words it is: 

The probability of the joint occurrence of event A and event B is equal to the 
conditional probability of A given B times the marginal probability of B. This 
type of probability, which is the probability of the simultaneous occurrence of 
two events, is called a joint probability. 


To illustrate the use of the multiplication rule, let us use Equation 3.6.3 to find 
the probability that an appliance picked at random from all appliances is both 
white and style 5,. The desired probability is 


P(C X O 5,) = P{S X )P(C { \S X ) = 


2750 # ]400 
10,000 * 2750 


1400 

10,000 


0.14 


We can also calculate this joint probability directly from the data of Table 
3.6.1, as follows. 


P(c { n 5,) 


n(C 1 n 50 

n(U) 


1400 

10,000 


0.14 


m 





Suppose that, in Equation 3.6.3, we are told that event B has occurred, but that 
this fact has no effect on the probability of A. That is, suppose that the probability 
of event A is the same regardless of whether or not B occurs. In this situation, 
P(A\B) = P(A). We say that A and B are independent events. The multiplication 
rule for two independent events, then, may be written as 

P(A fl B) = P(B)P(A) (3.6.4) 

Note that, when two events are independent, each of the following statements is 
true: 

P(A\B) = P(A), P(B\A) = P(B ), P(A D B) = P(A)P(B) 

In fact, two events are not independent unless all these statements are true. 

As an illustration of independence, consider the following example. 

EXAMPLE 3.6.3 A government agency employs 100 clerk-typists, classified by sex 
and marital status as shown in Table 3.6.2. If an employee is picked at random 
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TABLE 3.6.2 
Data for example 
3.6.3 


Joint Probability of 
Two Independent 
Events 




Marital Status 


Sex 

Single 

Married 

Total 

Male 

16 

24 

40 

Female 

24 

36 

60 

Total 

40 

60 

100 


from the 100 employees, the probability that he or she is single is P(S) = (24 4- 
16)/100 = 40/100 - 0.4. Now let us compute the probability that an employee 
picked at random is single, given that the employee is male. 

Using the formula for computing a conditional probability (Equation 3.6.1), 


P(S|M) - 


p(s n M) 
P(M) 


16/100 

40/100 


0.4 


Thus the additional information that a randomly selected employee is male does 
not alter the probability that he will be single, and P(S) = P(S|M). Consequently 
we can say that the two events—being single and being male—are, for this group, 
independent. 


We can show the calculation of the joint probability of two independent events 
by means of a tree diagram. 


EXAMPLE 3.6.4 In a certain factory, an average of 1 out of every 20 items coming 
off an assembly line is defective. The quality-control supervisor wants to know 
the probabilities of the following joint events for two items randomly selected 
from the assembly line. 

1. Both items are defective. 

2. The first item is defective and the second is not. 

3. The first item is not defective and the second is. 

4. Neither item is defective. 


The quality-control supervisor believes that whether or not a given item is defec¬ 
tive is independent of whether or not any other item is defective. 

We designate the probability of a defective item by P(D) and die probabil¬ 
ity of a nondefective item by P(D). We have P(D) = 1/20 and P( D) = 19/20. 
If the assumption of independence is correct, these probabilities hold for every 
item drawn from the assembly line. The four desired probabilities, then, are: 


1. P (D n D) = P(D)P(D) = (1/20X1/20) = 1/400 

2. P (D fl D) = P(D)P(D) = (l/20)(19/20) = 19/400 

3. P(D H D) = P(D)P(D) = (19/20)(l/20) = 19/400 

4. P(P n D) = P(D)P(D) = (19/20)(19/20) = 361/400 


FIGURE 3.6.2 
Tree diagram for 
Example 3.6.4 



&yP(D C\ D) = (1/20) (1/20) = 1/400 


P(D 0 D) = <1/20)09/20) = 19/400 


Pip n 0) '== (19/20(1/20) = 19/400 


P(D 0 0) = (19/20)09/20) = 361/400 


The total of the probabilities for these four mutually exclusive events is 

1 t 19 19 361 _ 400 = 

400 + 400 + 400 + 400 ” 400 " 

Figure 3.6.2 is a tree diagram representing these probabilities. 

Complementary We must consider one more concept here. The_probability of an event A is equal 
Events to 1 minus the probability of its complement A, and therefore 

P( A) = 1 - P(A) (3.6.5) 

This result follows from the third property of probability, since the event A and 
its complement A are mutually exclusive. 

Exercises 3.6.1 The following table classifies 1000 college graduates by area of concentration in 

college and type of employer for whom they went to work after graduation, (a) A graduate 
is picked at random from this group. Calculate the probability that he or she was: (1) an 
accounting major, (2) employed by a banking and finance firm, (3) an accounting major 
employed by a banking and finance firm, (4) an accounting major given that he or she 
was employed by a banking and finance firm, (5) an accounting major or an engineering 
major, (6) an accounting major or employed by a banking and finance firm, (b) Evaluate 





the following probabilities: (1) P(B 3 ), (2) P(A 4 ), (3) P(B ? Pi A 4 ), (4) P(A 4 )P(B 3 \A 4 ), (5) 
P(A 4 U B 3 ), (6) P(A 4 U A 3 ), (7) P(A X \B X ). (c) Is type of employer independent of area of 
concentration? How do you support your answer mathematically? 


Type of employer 


Major 


Public ac¬ 
counting 

A , 

Banking & 
finance 

a 2 

Elec¬ 

tronics 

A 3 

Merchan¬ 

dising 

a 4 

All 

others 

As 

Total 

Accounting 


60 

15 

10 

5 

10 

100 

General business 

b 2 

20 

50 

10 

65 

5 

150 

Humanities 

b 3 

1 

2 

2 

10 

60 

75 

Social science 

S 4 

2 

20 

8 

20 

70 

120 

Engineering 

Bs 

2 

5 

188 

5 

50 

250 

All others 

Be 

30 

50 

55 

60 

110 

305 

Total 


115 

142 

273 

165 

305 

1,000 


3.6.2 A company has two vacancies at the junior executive level. Ten people, seven men 
and three women, arc eligible and equally qualified. The company has decided to draw 
two names at random from the list of eligiblcs. What is the probability that: (a) both 
positions will be filled by women? (b) at least one of the positions will be filled by a 
woman? (c) neither of the positions will be filled by a woman? 

3.6.3 There are 300 homes in a neighborhood. At 100 of these, on a certain evening, no 
one is at home. Of the remaining homes, the occupants of 50 will not participate in 
telephone surveys. On a particular evening, a person conducting a telephone survey calls 
one of these homes at random. What is the probability that: (a) the surveyor will call a 
home where no one is present? (b) the surveyor will call a home in which someone is 
present, but the person will not participate in the survey? (c) the call will result in partic¬ 
ipation in the survey? 

3.6.4 In a certain firm, the probability that an employee picked at random will be over 
30 years of age is 0.58. What is the probability that an employee picked at random will 
be 30 years old or younger? 

3.6.5 Refer to Exercise 3.2.1. Compute the probability that an employee picked at random 
will belong to each of the sets specified in (a) through (i). 


3.7 BAYES' THEOREM 

Thomas Bayes (1702-1761) was an English clergyman interested in mathematics. 
His name is associated with an area of probability that concerns a method of 
estimating the probabilities of the causes that may have produced an observed 
event. Bayes’ work in this area, which was published shortly after his death, was 
reprinted recently in Biometrika. [See Bayes (1763).] The general principle, in 
the form in which we now know it, is due to Laplace [see Todhunter (1931)]. 
However, it is summarized in a theorem that bears Bayes’ name. This theorem 
may be stated as follows: 




EXAMPLE 3.7.1 In an office, three clerks process incoming copies of a certain 
form. The first clerk, B u processes 40% of the forms. The second clerk, B 2 , 
processes 35%. The third clerk, P 3 , processes 25%. The first clerk has an error 
rate of 0.04, the second has an error rate of 0.06, and the third has an error rate 
of 0.03. A form selected at random from a day’s output is found to have an error. 
The supervisor wishes to know the probability that it was processed by the first, 
second, or third clerk, respectively. 

Let A designate the event that a form containing an error is selected at random. 
Let B } , B 2 , and B 3 be the event that the form was processed by the first, second, 
and third clerk, respectively. Using our usual notation, we want to compute the 
following conditional probabilities: P(5,|A), P(B 2 \A), and P(£ 3 |A). From the in¬ 
formation given, we have the following unconditional probabilities: P{B { ) = 0.40, 
P(B 2 ) = 0.35, and P(B 3 ) = 0.25. These probabilities, which we can obtain 
without additional information, are called prior probabilities . 

We are also told that the conditional probabilities of finding a record with an 
error, given that it was processed by one of the three clerks, are: P{A\B { ) = 0.04, 
P(A\B 2 ) = 0.06, andP(A|P 3 ) = 0.03. These probabilities are called the likelihood. 
From them, we can calculate the three joint probabilities: 

P(A n B x ) = P(A|fij)P(Pi) = (0.04)(0.40) = 0.016 

P(A H B 2 ) = P{A\B 2 )P{B 2 ) = (0.06X0.35) = 0.021 

P(A n B 3 ) = P(A\B 3 )P(B 3 ) = (0.03)(0.25) - 0.0075 

We can now use Equation 3.7.2 to obtain the desired probabilities. 




TABLE 3.7.1 
Summary of 
calculations 
illustrating the use 
of Bayes' theorem. 
Example 3.7.1 


Exercises 


Event 

Prior probability 
P(Bf) 

Likelihood 

P(A | B0 

Joint probability 
P(A n Bp 

Posterior probability 
P(B-,\A) 


0.40 

0.04 

0.0160 

0.36 

b 2 

0.35 

0.06 

0.0210 

0.47 

b 3 

0.25 

0.03 

0.0075 

0.17 

Total 

1.00 

-- 

0.0445 

1.00 


P(BM ) = 


p(A n bo 


p(a n Bj) + p(a n b 2 ) + P(A n b 3 ) 
0.016 0.016 


B(B 2 |A) = 
P(B 3 |A) = 


0.016 + 0.021 + 0.0075 
P(A n B 2 ) 0.021 


0.0445 


= 0.36 


0.0445 

pjA n b 3 ) 

0.0445 


0.0445 

0.0075 

0.0445 


= 0.47 


= 0.17 


We call these posterior probabilities because they were calculable after it was 
known that the form was one that had an error. Table 3.7.1 summarizes these 
calculations. Figure 3.7.1 is a graphic representation (not drawn to scale) of the 
calculations in Table 3.7.1. 


Bayes’ theorem today is considered the cornerstone of modern decision theory, 
sometimes called Bayesian decision theory (see Chapter 15). Its use in this regard, 
however, has some critics. The criticism focuses not on its mathematical integrity, 
but on the way the prior probabilities are sometimes determined. These prior 
probabilities frequently are arrived at subjectively or intuitively, rather than being 
based on objective data. 

3.7.1 In a survey, 1000 adult males are cross-classified by father’s occupational status 
and by whether or not they have surpassed their fathers in occupational status. The fol¬ 
lowing table shows the results. A man is selected at random from this group for further 
interviews. It is found that he has surpassed his father’s occupational status. What is the 
probability that his father (a) was an unskilled laborer? (b) was a semiskilled or skilled 
laborer? (c) held a clerical or sales job? (d) held a semiprofessional or low-level-manage¬ 
ment job? 


Father’s 

occupational status 


_ Son __ 

Surpassed Not surpassed 


Unskilled labor 

250 

100 

Semiskilled and skilled labor 

150 

100 

Clerical and sales 

115 

110 

Semiprofessional and low-level management 

70 

105 



FIGURE 3.7.1 
Graphic 

representation of 
Bayes' theorem in 
Example 3.7.1 


3.7.2 A mail-order business receives orders from 4 areas of the country, as shown in the 
following table. Ten percent of the orders from the East, 25% of those from the South, 
5% from the Midwest, and 15% from the West are for more than $10. An order picked 
at random from the files is found to be for more than $10. What is the probability that it 
came from (a) the East? (b) the South? (c) the Midwest? (d) the West? 


Percent of orders 


East 

South 

Midwest 

West 










Summary 


Review Questions 


3.7.3 In a suburban community, 30% of the households use Brand A toothpaste, 27% use 
Brand B, 25% use Brand C, and 18% use Brand D. In the four groups of households, the 
proportions of residents who learned about the brand they use through television advertising 
are as follows: Brand A, 0.10; Brand B, 0.05; Brand C, 0.20; and Brand D, 0.15. In a 
household selected at random from the community, it is found that residents learned about 
the toothpaste through television advertising. What is the probability that the brand of 
toothpaste used in the household is (a) A? (b) B? (c) C? (d) D? 

This chapter presented some of the basic concepts of probability. Since the use 
of set terminology makes it easier to discuss probability, we gave some of the 
ideas of set theory at the beginning of the chapter. Some of the terms that were 
defined are: set, subset, null set, disjoint set, universal set, union, and intersection. 

You learned that various sets and subsets can be portrayed visually by a graphic 
device known as a Venn diagram. 

To ease the task of counting certain events in order to compute their associated 
probabilities, we introduced some counting rules. You learned how to determine 
the number of permutations that can be made from n objects taken r at a time. 
You also learned about another type of arrangement called a combination. We 
showed how to find the number of combinations that can be made from n things 
taken r at a time. 

You learned that we can discuss probability from three different points of view: 
the a priori or classical, the a posteriori or relative frequency, and the subjective. 

You learned three elementary properties of probability for a given set of mu¬ 
tually exclusive events E x , E 2 , . . . , E n . 

1. 0 < P{Ej < 1 

2. P{E X ) + P(E 2 ) + • • • + P(E„) = 1 

3. P(E, or Ej) = P(Ej) + P(Ej) 

In calculating probabilities, one needs to understand the concepts of mutually 
exclusive events, independent events, unconditional probability, conditional prob¬ 
ability, marginal probability, and joint probability. 

You learned the addition rule and the multiplication rule, two rules that are 
helpful in calculating certain probabilities. 

Finally, this chapter discussed Bayes’ theorem, which provides a formula for 
computing a conditional probability. This theorem is called the formula for the 
probability of “causes,” since it enables us to find the probability of a particular 
By, or “cause,” which may have brought about event A. 

You can find a deeper and broader coverage of probability in books by Bates 
(1965), Berman (1969), Dwass (1970), Hausner (1977), Hymans (1967), and 
Mosteller et al. (1970). The history of the development of probability theory is 
very interesting. David (1962) and Todhunter (1931) have treated this aspect of 
the subject. 

1. Define the following: 

(a) probability (c) subjective probability 

(b) objective probability (d) classical probability 





(e) relative frequency (h) marginal probability 

(f) mutually exclusive events (i) joint probability 

(g) independence (j) conditional probability 



2. Name and explain the three properties of probability. 

3. What is a set? Give three examples of sets. 

4. Define the following: (a) unit set, (b) subset, (c) empty set, (d) disjoint set, 
(e) universal set, (f) complement 

5. Under what condition are two sets considered equal? 

6. Define and give an example of the union of two sets. 

7. Define and give an example of the intersection of two sets. 

8. What is a Venn diagram? 

9. Define and illustrate the following: (a) factorial, (b) permutation, (c) combination, 
(d) addition rule, (e) multiplication rule 

10. Make up a realistic problem from your area of interest to illustrate the use of Bayes’ 
theorem. 

11. A firm interviews 15 applicants for a position on its sales force. One question on the 
application form concerns leisure-time activities. Six applicants say they spend a major 
portion of their leisure time playing golf. Ten mention bowling. Three do not mention 
sports at all. How many applicants spend their leisure time both golfing and bowling? 
Illustrate with a Venn diagram. [Hint: n(A U B) - n(A ) + n(B ) - n(A fl 5).] 

12. At a convention, 4 men and 2 women are to be seated at the head table. How many 
different arrangements are possible if the 6 persons are distinguishable only with respect 
to sex? 

13. A final examination consists of 20 questions. A student who takes the exam is told 
to select and answer 15. How many possible tests does a student have from which to 
choose? 

14. Of 1000 items produced in a day in a certain factory, 400 are produced on the first 
shift, 350 on the second, and 250 on the third. An item is picked at random. What is the 
probability that it was produced on (a) the first shift? (b) the second shift? (c) the third 
shift? (d) Either the first or the second shift? 

15. In Question 14, suppose that the proportions of defective items produced on the first, 
second, and third shifts are 0.01, 0.02, and 0.04, respectively. An item is picked at 
random, (a) What is the probability that it is defective? (b) What is the probability that it 
is defective, given that it was produced on the third shift? (c) What is the probability that 
it is defective and also was produced on the first shift? 

16. A race driver uses Make A cars 50% of the time, Make B cars 30% of the time, and 
Make C cars 20% of the time. Of 25 races he has entered with Make A cars, he has won 
5. In 15 races with Make B cars, he has won 4. In 10 races with Make C cars, he has 
won 4. He has just won a race. What is the probability that the car he was driving was 
(a) Make A? (b) Make B? (c) Make C? 

17. Set C consists of the employees of a certain firm who voted in favor of a new insurance 
plan. Set D consists of the employees of_the same firm who have children in school. 
Define: (a)CUD,(b)Cfl D, (c) C, (d) D 

18. Express each of the following sets by a single symbol: (a) A fl 0, (b) A fl A, 
(c) A U 0, (d) A U A 


19. Express each of the following sets by a different symbol: (a) U, (b) (X), (c) 0, 
(d) (A1J0) 

20. A hundred business people are asked to specify the type of magazine they prefer. The 
following table shows the 100 responses, cross-classified by educational level and type of 
magazine preferred. Specify the number of members of each of the following sets, (a) S, 
(b) V U C, (c) A, (d) W, (e) U, (f) B, (g) T n B, (h) (T n Q" 


Educational level 


Type of magazine 

High school 
(A) 

College 

(B) 

Graduate 
school(C) 

Total 

Sports ( S ) 

15 

8 

7 

30 

General news ( T ) 

3 

7 

20 

30 

Travel (\/) 

5 

5 

15 

25 

Business news (W) 

10 

3 

2 

15 

Total 

33 

23 

44 

100 





21. An athletic team has 12 members. It plans to elect 4 officers—a captain, a co-captain, 
a manager, and a treasurer—by secret write-in ballot. All 12 members are eligible and 
willing to serve. How many possible sets of 4 members can serve? [Ignore the office held.] 

22. In a certain computer center there are 7 keypunch operators who must sit at machines 
that are placed one behind the other. In how many ways can the 7 operators be assigned 
to the machines? 

23. A salesperson has 10 clients to visit in a week. The clients are 3 manufacturers, 4 
retail stores, 2 office-supply firms, and 1 government agency. How many distinguishable 
visiting arrangements can the salesperson prepare if he or she wishes to distinguish among 
the clients only on the basis of type of business? 

24. The probability that a salesperson will make a sale is 0.8. What is the probability 
(assuming independence) that on two calls made in a day, the salesperson will make two 
sales? 

25. The following table shows the outcome of 500 interviews attempted during a survey 
of opinions about big business held by residents of a certain city. The data are also classified 
by the area of the city in which the interview was attempted. A questionnaire is selected 
at random from the 500. (a) What is the probability that: (1) The questionnaire was 
completed? (2) The potential respondent was not at home? Refused to answer? (3) The 
potential respondent lived in area A? B? D? E? (4) The questionnaire was completed, 
given that the potential respondent lived in area B? (5) The potential respondent refused 
to answer the questionnaire or lived m area D? (b) Calculate the following probabilities: 
(1) P(A n R), (2) P(B U C), (3) PCD), (4) P(N|D), (5) P(B|R). (6) P(C). 


Outcome of interview 


Area of city 

Completed (C) 

Not at home (N) 

Refused (R) 

Total 

A 

100 

20 

5 

125 

B 

115 

5 

5 

125 

D 

50 

60 

16 

125 

E 

35 

50 

40 

125 

Total 

300 

135 

65 

500 


-il_--- 




26. In a large city, 70% of the households receive a daily newspaper, and 90% have a 
television set. Suppose that these two events are independent. What is the probability that 
a randomly selected household will be one that receives a daily newspaper and has a 
television set? 

27. In a certain factory, 10% of the assembly-line employees have only a fourth-grade 
education or less. The educational status of the remainder is as follows: completed grade 
five, six, or seven, 50%; completed eighth grade or higher, 40%. Of the first group, 20% 
are under 25 years old. Of the second group, 50% are under 25. Of the third group, 70% 
are under 25. An employee picked at random from this population is found to be under 
25. What is the probability that the employee has a fourth-grade education or less? Find 
the probabilities for the other two groups. 

28. Of the persons employed by a company, 45% come from one area of the city, area 
A; 30% from a second area, B; and 25% from a third area, C. Within one year, 30% of 
the employees from area A, 20% from area B, and 10% from area C leave the firm. A 
record is picked at random from the firm’s files. It is found that the person under consid¬ 
eration left the firm within one year. What is the probability that the person was from area 
A? area B? area C? 

29. In a certain business firm, the probability that an employee picked at random is a 
college graduate is 0.55. What is the probability that an employee picked at random is not 
a college graduate? 

30. In a government agency, 30% of the employees take public transportation to work. 
Also, 60% of the employees are female. It is assumed that these two characteristics are 
independent. Draw a tree diagram to illustrate and find the probability that an employee 
picked at random from this population will be: (a) female and take public transportation 
to work; (b) female and not take public transportation to work; (c) male and take public 
transportation to work; (d) male and not take public transportation to work. 

31. In a certain firm, 6 persons (4 females and 2 males) are eligible for promotion to 2 
higher-paying positions, (a) How many different combinations of these employees are 
eligible for promotion? List them. 

Assume that the persons who will be promoted are to be selected at random. Find the 
probability that: (b) at least one of the persons promoted will be a female, (c) exactly one 
female will be promoted, (d) no more than one female will be promoted, (e) no female 
will be promoted. 

32. There are 50 applicants for a job. Ten are members of ethnic group A, 15 of ethnic 
group B, and 25 of ethnic group C. The numbers of females (F) in the three groups are 
2, 9, and 15, respectively. A person is selected at random to fill the job. Find the probability 
that this person will be (a) A and F, (b) A or F or both, (c) F given B, (d) A and B, (e) 
F and C 

33. Suppose that there are two events E i and E 2 such that P(E { ) = 0.20 and P(E 2 ) = 
0.30. (a) Given that E { and E 2 are independent, what is P{E { fl E 2 )'7 (b) Given that E } 
and E 2 are mutually exclusive, what is P(E { fl E 2 )? (c) Given that E x and E 2 are mutually 
exclusive, what is P(E } U £ 2 )? 

34. In a certain firm, 60% of the employees are males. Furthermore, 80% of the females 
and 60% of the males are high school graduates. Find the probability that an employee 
picked at random will be (a) a male high school graduate, (b) a female high school 
graduate. 



35. An office supervisor claims to be a good judge of human intelligence. To test this 
claim, a psychologist asks the supervisor to pick the three most intelligent workers from 
a group of nine and rank them in order of intelligence. Suppose that the supervisor actually 
has no special ability to evaluate intelligence, (a) What is the probability that the supervisor 
makes a correct selection and ranking? (b) What is the probability that the supervisor will 
select at least one of the three most intelligent workers? 

36. A personnel director has found that she can fill a certain type of position within one 
week 70% of the time. But she finds that 60% of the time, all applicants for the position 
are high school dropouts. The other 40% of the time, none of them are dropouts. When 
all applicants are high school dropouts, the position is filled within one week only 56% 
of the time, (a) What is the probability that the position is filled within one week, given 
that the applicants are not high school dropouts? (b) Is filling the position within a week 
independent of whether the applicants are high school dropouts? 

37. The personnel director of a firm has determined the following probabilities for the 
length of time a certain type of employee remains with the firm after being hired. 


Length of stay Probability 


Less than 1 year 0.05 

1 year but less than 2 0.10 

2 years but less than 3 0.10 

3 years but less than 4 0.15 

4 years but less than 5 0.20 

5 years or longer 0.40 

1.00 


Find the probability that a newly hired employee of this type will be with the firm (a) less 
than 5 years, (b) 3 years or more, (c) at least 2 but less than 4 years, (d) at least 4 years. 



4. Some Important Probability 
Distributions 


Chapter Objectives: This chapter deals with the basic 
concepts you need to understand random variables and 
probability distributions. These concepts and techniques 
provide the foundation for the statistical inference pro¬ 
cedures that we discuss later. You will learn to use theo¬ 
retical distributions that help you to get approximations 
of many probability distributions found in business situa¬ 
tions. Distributions such as these are a necessary back¬ 
ground for understanding a special type of probability 
distribution that we shall encounter in Chapter 5. After 
studying this chapter and working the exercises, you 
should be able to do the following. 

1. Distinguish between discrete and continuous random 
variables 

2. Construct a probability distribution from raw data 

3. Compute the mean and variance of a probability dis¬ 
tribution 

4. Use the binomial, Poisson, hypergeometric, and nor¬ 
mal distribution models to calculate probabilities for 
appropriate random variables 

5. Determine which model—the binomial, Poisson, hy¬ 
pergeometric, or normal—is appropriate for describing 
a given situation 



4.1 INTRODUCTION 


In Chapter 3 we presented the basic concepts of probability theory, as well as 
methods for computing the probability of an event. This chapter builds on those 
methods and concepts. It introduces techniques for calculating the probability of 
an event under more complicated circumstances. 

We shall discuss the topic of this chapter, probability distributions, under two 
headings: probability distributions of discrete random variables , and probability 
distributions of continuous random variables. Recall that Chapter 2 discussed 
discrete and continuous random variables. 


4.2 PROBABILITY DISTRIBUTIONS OF DISCRETE RANDOM VARIABLES 

Let us begin with the following definition: 

The probability distribution of a discrete random variable is a table, graph, 
formula, or other device used to specify all possible values of the discrete 
random variable, along with their respective probabilities. 

EXAMPLE 4.2.1 A certain firm employs 50 salespersons. Let us construct the prob¬ 
ability distribution of the random variable X , the number of new customers each 
salesperson obtained during the past year. We can do this by means of a table in 
which one column lists the possible values that A can assume. Another column 
lists P(X = jc,-), the probability of X assuming each value. Table 4.2.1 shows the 
probability distribution of A for our firm. The entries in the last column are the 
relative frequencies of occurrence of values of X. 

Alternatively, we may represent the probability distribution by a graph. In 
Figure 4.2.1, the length of each vertical line indicates the probability for the 
corresponding value of x in this example. The values of P(X = x,) are all positive, 
they are ail less than 1, and their sum is equal to 1. These characteristics are not 


TABLE 4.2.1 
Probability 
distribution of 
number of new 
customers obtained 
by 50 salespersons 


X; 

Frequency of occurrence of x, 

P(X - x,-) 

0 

1 

1/50 

1 

2 

2/50 

2 

4 

4/50 

3 

3 

3/50 

4 

6 

6/50 

5 

8 

8/50 

6 

10 

10/50 

7 

7 

7/50 

8 

5 

5/50 

9 

3 

3/50 

10 

1 

1/50 

Total 

50 

50/50 



FIGURE 4.2.1 
Probability 
distribution of 
number of new 
customers obtained 
by 50 salespersons 
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peculiar to this particular example. They are the essential properties of the prob¬ 
ability distribution of a discrete random variable. This may be expressed more 
formally as follows: 


Given a discrete random variable X that can assume only the k different values 
x 1( x 2 , .. x k , the probability distribution of X must satisfy the following two 
conditions: 


0 < P(X = Xjj < 1 
XP(X = X/) = 1 


/ = 1 ,2, .... k 


(4.2.1) 

(4.2.2) 


We can make probability statements about the random variable X once we know 
its probability distribution. Suppose, for example, that a salesperson is picked at 
random from the 50. What is the probability of selecting a salesperson who got 
four new customers? The last column of Table 4.2.1 shows that the answer is 
6/50 = 0.12. That is, P(X = 4) = 6/50 = 0.12. 

What is the probability that a salesperson selected at random is one who got 
either five or six new customers? To answer this question, we use the addition 
rule. The probability of selecting a salesperson who got five new customers is 
8/50. The probability of selecting a salesperson who got six new customers is 
10/50. Thus the probability of selecting a salesperson who got either five or six 
new customers is 8/50 + 10/50 = 0.16 + 0.20 = 0.36. We express this more 
compactly as P(X = 5 or 6) = P(X = 5) + P(X = 6) = 0.36. 

Cumulative Distributions At times it will be more convenient to compute prob¬ 
abilities using the cumulative probability ; distribution of a random variable. To 
obtain the cumulative probability distribution for the discrete variable whose prob¬ 
ability distribution is given in Table 4.2.1, we successively add the probabilities 
P(X = Xi) in the last column. Table 4.2.2 shows the resulting cumulative prob¬ 
ability distribution. 


TABLE 4.2.2 
Cumulative 
probability 
distribution of 
number of new 
customers obtained 
by 50 salespersons 


FIGURE 4.2.2 
Cumulative 
probability 
distribution of 
number of new 
customers obtained 
by 50 salespersons 


X 

Frequency of occurrence of x 

II 

PfX < x) 

0 

1 

1/50 

1/50 

1 

2 

2/50 

3/50 

2 

4 

4/50 

7/50 

3 

3 , 

3/50 

10/50 

4 

6 

6/50 

16/50 

5 

8 

8/50 

24/50 

6 

10 

10/50 

34/50 

7 

7 

7/50 

41/50 

8 

5 

5/50 

46/50 

9 

3 

3/50 

49/50 

10 

1 

1/50 

50/50 

Total 

50 

50/50 



The graph in Figure 4.2.2 shows the cumulative probability distribution of X 
for this example. We call the cumulative probability distribution F(x). That is, 
F(x ) = P(X < x), the probability that X is less than or equal to any value of ;c. 
The graph of F(x) consists of the horizontal lines only. The vertical lines only 
give the graph a connected appearance. The length of each vertical line is equal 
to that of the corresponding line in Figure 4.2.1. For example, the vertical lime 
at X ~ 6 in Figure 4.2.2 is equal in length to the line erected at X = 6 in Figure 
4.2.1, or 10/50 units on the vertical scale. 

The cumulative probability distribution lets us answer such questions as: 

1. What is the probability that a salesperson picked at random got fewer than 
four new customers during the past year? To answer this question, we need to 
find P(X < 4), or P(X < 3). We can find this in Table 4.2.2 by noting the value 
of P(X < x) for x = 3. We find this to be 10/50 = 0.20. 

2. What is the probability that a randomly selected salesperson got four or more 
new customers during the past year? The answer is the complement of the answer 
to the previous question. Since we have already found P(X < 4) = 0.20, we 
have P(X > 4) = 1 - P(X < 4) = 1 - 0.20 = 0.80. 



3. What is the probability that a salesperson selected at random got between five 
and eight new customers, inclusive? To answer this, we need P {5 < X < 8), 
which is equal to P(X < 8) - P(X < 5). From Table 4.2.2, we find P(X < 8) 
= 46/50 and P(X < 5) = 16/50, so that P (5 < X < 8) = 46/50 - 16/50 = 
30/50 = 0.60. 

The probability distribution in Table 4.2.1 was developed from actual experi¬ 
ence. To find another variable that followed this distribution would be a coinci¬ 
dence. However, the probability distribution of a discrete random variable of 
interest often follows, or approximates, some probability distribution that has been 
named and extensively studied. The next three sections introduce three well-known 
distributions—the binomial distribution, the Poisson distribution, and the hyper¬ 
geometric distribution. 


The Mean and 
Variance of 
Discrete Probability 
Distributions 


We introduced the concepts of mean and variance in Chapter 2, and we discussed 
the calculation of their measures for samples and finite populations. This section 
treats these concepts in terms of discrete probability distributions. 

The mean of a probability distribution is the expected value of the random 
variable that has the specified distribution. The expected value of a discrete random 
variable X is merely the arithmetic mean. Therefore it may be labeled p. To obtain 
it, we multiply each value of the random variable by its probability of occurrence 
and sum the products. We express this procedure symbolically as follows: 


E(X) = 2 xP(X = x) = p 


(4.2.3) 


Similarly we define the variance of the probability distribution of the random 
variable X as the expected value of the squared deviations of the values of X from 
their mean. In symbols, we write this as 

E(X - p) 2 = 2(a - p) 2 P(X = A') = O 2 (4.2.4) 

Alternatively we may write the variance of X as 

(T 2 = EiX 2 ) - [E(X )] 2 (4.2.5) 

where E(X 2 ) = 2x 2 P(X = a). 


EXAMPLE 4.2.2 Suppose that the random variable X has the probability distribution 
shown in the first two columns of Table 4.2.3. We compute the mean and variance 
of X as follows. 

p = E(X) = 2xP(X = x) 

= 1(1/6) + 2(1/6) + 3(2/6) + 4(1/6) + 5(1/6) = 3 
a 2 = E(x — p) 2 = 2 (a — p) 2 P(X = a) 

= (1 - 3) 2 ( 1/6) + (2 - 3) 2 (l/6) + (3 - 3) 2 (2/6) 

+ (4 - 3) 2 ( 1/6) + (5 - 3) 2 ( 1/6) 

= 10/6 = 1.67 



TABLE 4.2.3 
Calculations for 
obtaining the 
mean and variance 
of the random 
variable X 


X 

P(X - x) 

x - n 

(* - M) 2 

xP(X = x) 

(x - fi )*P(X -- x) 

X 2 

1 

1/6 

-2 

4 

1/6 

4/6 

1 

2 

1/6 

- 1 

1 

2/6 

1/6 

4 

3 

2/6 

0 

0 

6/6 

0 

9 

4 

1/6 

1 

1 

4/6 

1/6 

16 

5 

1/6 

2 

4 

5/6 

4/6 

25 

Total 

1 



X>P<* = x) = 3 

H (x - n ) 2 P(x = X) 
= 10/6 = 1.67 



From the data in the last column of Table 4.2.3, we compute 


EiX 2 ) = 1(7) + 4 ( 7 ) + 9 


+ 16 


+ 251- 
6 


^ = 10.67 
6 


By Equation 4.2.5, then, we may compute 

a 2 = 10.67 - (3) 2 = 1.67 


This result agrees with that obtained using Equation 4.2.4. 

Equations 4.2.3 and 4.2.4 are examples of weighted arithmetic descriptive 
measures in which the weights are probabilities. The calculations are similar to 
those for computing the mean and variance of grouped data, discussed in Chap¬ 
ter 2 . 

For the probability distribution of a continuous random variable, we find the 
mean and variance by integrating xf(x) and (* — fi) 2 f(x) over the possible values 
of x. The computations are analogous to Equations 4.2.3 and 4.2.4. 

Exercises pn 4.2.1 The following table shows the distribution of the number of days of sick leave taken 
■III 1 by 100 employees during a year, (a) Construct and graph the probability distribution of X 
1 = days of sick leave taken, (b) Construct and graph the cumulative probability distribution. 



Xj (Days of sick 

leave taken) 01 23456789 10 

Number of employees 5 8 10 12 18 14 10 9 8 4 2 

4.2.2 Find, for Exercise 4.2.1, the probability that a randomly selected employee will be 
one who took: (a) 3 days sick leave, (b) more than 5 days sick leave, (c) 6 to 8 days 
(inclusive) sick leave, (d) either 9 or 10 days sick leave. 

4.2.3 For Exercise 4.2.1, find: (a) P(X = 0), (b) P(X = 10), (c) P(X > 6), 
(d) P(X < 6), (e) P (3 < X < 7). 

4.2.4 The following table shows the distribution of on-the-job accidents that befell 500 
factory employees during a given year, (a) Construct and graph the probability distribution 
of X — number of accidents, (b) Construct and graph the cumulative probability distri¬ 
bution. (c) Find the probability that a randomly selected employee will be one who had 




(1) no accident, (2) more than three accidents, (3) between two and four accidents, inclu¬ 
sive, (4) fewer than four accidents, and (5) four or more accidents, (d) Find the mean and 
variance. 


Number of accidents 0 1 2 3 4 5 6 

Number of employees 300 100 60 20 10 5 5 

4.3 THE BINOMIAL DISTRIBUTION 

The binomial distribution is one of the most widely encountered probability dis¬ 
tributions in applied statistics. It is derived from a process known as a Bernoulli 
trial. This is named for the Swiss mathematician James Bernoulli (1654-1705), 
who made many contributions to probability theory. A Bernoulli trial is a trial of 
some process or experiment that can result in only one of two mutually exclusive 
outcomes, such as defective or not defective, correct or incorrect, present or 
absent, acceptable or not acceptable. A sequence of Bernoulli trials forms a Ber¬ 
noulli process when the following conditions are met: 

1. Each trial results in one of two possible, mutually exclusive outcomes. One 
of the possible outcomes is denoted (arbitrarily) as a success, the other as a failure. 

2. The probability of a success, p , remains constant from trial to trial. (The 
probability of a failure, 1 - /?, is denoted by q.) 

3. The trials are independent, that is, the probabilities associated with any par¬ 
ticular trial are not affected by the outcome of any other trial. 

In n repetitions of a Bernoulli trial, the number of successes possible is 0, 1, 
2, . . n. We want to be able to determine the probability of each possible 
number of successes in n repetitions, or trials. The distribution from which we 
determine these probabilities is called the binomial distribution. 

EXAMPLE 4.3.1 A horticulturist knows from experience that 90% of a certain kind 
of seedling will survive being transplanted. A random sample of 5 seedlings is 
selected from current stock. What is the probability that exactly 3 will survive? 

The probability of survival is 0.90 for each seedling. Let us call survival a 
success and nonsurvival a failure. Also let us assign a value of 1 to a success 
(survival) and a value of 0 to a failure (nonsurvival). The actual random selection 
of a seedling is a Bernoulli trial. 

Suppose that the first seedling survives (S), the second seedling fails to survive 
(F), the third and fourth survive, and the fifth fails to survive. We record the 
following sequence of outcomes: SFSSF. Using zeroes and ones, we may write 
the sequence of outcomes as 10110. We find the probability of this sequence of 
outcomes using the multiplication rule. It is given by 

P( 1,0, 1, 1,0) = pqppq = q 2 p 3 

We are looking for the probability of a success, a failure, a success, a success, 
and a failure, in that order. In other words, we want the joint probability of the 
5 outcomes. (For simplicity, we have used commas, rather than intersection no¬ 
tation, to separate the outcomes of the events in the probability statement.) 



The resulting probability is the probability of obtaining the specific sequence 
of outcomes in the order shown. But we are not interested in the order in which 
the successes and failures occur. Rather we are interested in the probability of the 
occurrence of exactly 3 successes (survivals) out of 5 randomly selected seedlings. 
In addition to the given sequence (call it sequence 1), 3 successes and 2 failures 
could also occur in any one of the sequences in Table 4.3.1. Each of these 
sequences has the same probability of occurring, cfp*. 

A single sample of size 5, drawn from the population specified, yields only one 
sequence of successes and failures. The question we must answer is: What is the 
probability of getting sequence 1, or sequence 2, .... or sequence 10? To find 
the answer, we use the addition rule to calculate the sum of the individual prob¬ 
abilities. In this example, we need to find the sum of the 10 q 2 p 2 's or, equivalently, 
multiply q 2 p 1 by 10. 

We can now answer the original question: What is the probability that in a 
random sample of size 5, drawn from the specified population, there are 3 suc¬ 
cesses (survivals) and 2 failures (nonsurvivals)? Since p = 0.90, q = (1 — p) 
- (1 - 0.90) = 0.10, the answer is 

10(0.10) 2 (0.90) 3 - 10(0.01)(0.729) = 0.0729 


Figure 4.3.1 illustrates the solution to this problem with a tree diagram. The 
probability of each individual event (S or F) is given in parentheses on the branch 
representing the event. 

As the size of the sample increases, it becomes more and more difficult to 
construct a tree diagram or list the number of sequences. We need an easy method 
of counting them. Since a sequence of outcomes consists of n things, some of 
which are of one type and the rest of which are of another type, we can use 
Equation 3.3.6 to count the number of sequences. Using this equation, we find 
the number of sequences to be 


(5) 5! = 120 

\3/ 3!2! 12 


10 


In general, if n equals the total number of objects, x the number of objects of one 
type, and n — x the number of objects of the other type, the number of sequences 
is equal to 

M = » ! 

\xj x\{n - x)! 

which is equal to the number of combinations of n things taken x at a time. 


TABLE 4.3.1 
Additional 

Sequence number 

Sequence 

Sequence number 

Sequence 

sequences for 

2 

11100 

6 

10101 

3 successes and 

3 

10011 

7 

OHIO 

2 failures 

4 

11010 

8 

00111 

5 

11001 

9 

10 

01011 

01101 



FIGURE 4.3.1 
Tree diagram for 
Example 4.3.1 



Probability 


0.00729 


0.00729 


0.00729 


0.00729 


0.00729 


0.00729 


0.00729 


0.00729 


0.07290 


We can write the probability of obtaining exactly x successes in n trials, then, 
as 


fix) = 



q Y 


= 0 


for x = 0, 1, 2, . . ., n 
elsewhere 


(4.3.1) 


This expression is the binomial distribution . In Equation 4.3.1, fix) = P(X = 
x), where X is the random variable, number of successes in n trials. We use f(x) 
rather than P(X = x) not only because it is shorter, but also because it is commonly 
used. 

Table 4.3.2 shows the binomial distribution in tabular form. 

To establish the fact that Equation 4.3.1 has the two essential properties of a 
probability distribution, consider the following: 

1 . f(x) > 0 for all real values of x. This follows from the fact that n and p are 
both nonnegative, and hence 


('f)' and o ~ pf 


are all nonnegative. Consequently their product is greater than or equal to 0. 



TABLE 4.3.2 
The binomial 
distribution 


Number of successes, x 

Probability, f (x) 

0 

(o) 

i q n ~°p° 

1 

(7) 

<7 n “ 1 p 1 

2 

0 

q n ~ 2 P 2 

X 

C) 

| q n ~*p K 

n 


\ q n ~ n p n 

Total 


1 


2. 2f(jt) = 1. To see that this is true, we must recognize that 

i (") f~Y = [0 - P) + PT = 1" = l 

the familiar binomial expansion. Expansion of the binomial (q + p) n yields 

(q + p) n — q n 4- nq”~ ] p ] + U ^ n —— q n ~ 2 p 2 + • • • + nq'p”~ ] + p n 

Suppose that we compare the terms in this expansion term for term with the 
*/M in Table 4.3.2. We see that they are equivalent, since 

/ (0) = (o) y"'~°P 0 = q " 

/(l) = W c/'-'p' = nq"-'p' 

(n\ , n(n - 1) , 

/(2) = I 2 1 q P~ = -j — q p 

/(») = (”) q"~"p" = p" 

EXAMPLE 4.3.2 In a certain community, on a given evening, someone is at home 
in 85% of the households. Suppose that a researcher conducting a telephone survey 
randomly selects 12 households to call that evening. What is the probability that 
someone will be at home in exactly 7 households? 





The answer, by Equation 4.3.1, is 

/(7) = ^^(0.15) ,2 - 7 (0.85) 7 = ^ (0.00007594X0.320577) = 0.0193 

The Binomial Table When the sample size is large, using Equation 4.3.1 to calculate probabilities is 
tedious. Fortunately probabilities for different values of n, p , and x have been 
tabulated. Thus, instead of calculating the probabilities, we may consult a table 
to find the desired result. Table A of the Appendix is one such table. This table 
gives the probability that X is less than or equal to some specified value. 

Some additional examples illustrate the use of Table A. 

EXAMPLE 4.3.3 An insurance company has found that 8% of its claims are for 
damages resulting from burglaries. What is the probability that a random sample 
of 20 claims will contain 5 or fewer that are for burglary damages? 

We seek the probability that X < 5 when n = 20, p = 0.08, and q = 0.92. 
The table gives the probability that X < x, so we only need to locate the entry 
corresponding to n = 20, p = 0.08, and X = 5. We find this to be 0.9962. We 
can write the problem and its solution in a more compact notation as 

P(X < 5 |n = 20, p = 0.08) = 0.9962 

EXAMPLE 4.3.4 In the previous example, what is the probability that a sample of 
20 claims will contain more than 5 claims for damages resulting from burglaries? 

The answer to this question is the complement of the probability found in 
Example 4.3.3. Thus 

P(X > 5|* = 20, p = 0.08 = 1 - P(X < 5\n = 20, p = 0.08) 

= 1 - 0.9962 = 0.0038 

EXAMPLE 4.3.5 In Example 4.3.3, let us determine 

P (2 < X < 5\n = 20, p = 0.08) 

In this example we want the probability associated with an interval. To obtain 
the answer we must find the probability that X < 5 and subtract from it the 
probability that X < 2 (or X < 1). Therefore when n = 20 and p = 0.08, 

P(2 < X < 5) = P(X < 5) - P(X < 1) = 0.9962 - 0.5169 = 0.4793 

EXAMPLE 4.3.6 In Example 4.3.2, suppose that the researcher calls a random 
sample of 12 households in a community on the night that 85% of the households 
have someone at home. Use Table A to find the probability that the person con¬ 
ducting the telephone survey finds someone at home in exactly 7 households. 

Table A does not give probabilities for values of p greater than 0.5. We can 
find the probability, however, by restating the problem as follows: What is the 
probability that the person conducting the telephone survey gets no answer from 



exactly 5 calls out of 12, if no one is at home in 15% of the households? We find 
the answer as follows: 


The Mean and 
Variance of 
the Binomial 
Distribution 


Use of Chebyshev's 
Theorem 


P(X = 5 |n = 12, p = 0.15) - P(X < 5) - P(X < 4) 

= 0.9954 - 0.9761 = 0.0193 

This is the same answer we obtained previously. Thus the probability of finding 
someone at home in exactly 7 households is equal to the probability of finding no 
one at home in exactly 5 households, given the conditions specified in the example. 

The binomial distribution is really a family of distributions. Each different value 
of either n or p specifies a different distribution. In this distribution n and p are 
called parameters. Figure 4.3.2 shows how the binomial distribution varies for 
different values of p and n. Regardless of the value of n, the distribution is 
symmetric when p = 0.5. When p is greater than 0.5, the distribution is asym¬ 
metric and the peak occurs to the right of center. When p is less than 0.5, the 
distribution is asymmetric and the peak occurs to the left of center. 

In theory, the binomial distribution can be applied only when the sample is 
drawn from an infinite population, or from a finite population when sampling is 
with replacement. (When sampling is with replacement, a selected unit is returned 
to the population before the next unit is selected.) 

In practice, samples are usually drawn from finite populations. Therefore the 
question naturally arises as to whether the binomial distribution is appropriate, 
given this circumstance. The answer depends on how constant p remains as suc¬ 
ceeding observations are drawn. It is generally agreed that when n is small relative 
to the population size JV, the binomial model is appropriate. That is, the constancy 
of p is not seriously affected. Some writers say that n is small relative to N if N 
is at least 10 times as large as n. 

The mean and variance of the binomial distribution are 

A = np (4.3.2) 

and 

CT 2 = np{\ - p) (4.3.3) 

respectively, where n is the number of trials, p is the probability of a success for 
each trial, and the trials are independent. Thus we find the mean of the binomial 
distribution by multiplying the number of trials by the probability of a success on 
an individual trial. In other words, we expect , in the long run, to observe np 
successes out of n Bernoulli trials. 

To find the variance of the binomial, as indicated by Equation 4.3.3, we mul¬ 
tiply the number of trials n by the probability of a success. We then multiply this 
product by the probability of a failure. 

Chapter 2 showed how to use Chebyshev’s theorem to calculate the proportion of 
values, in a set of data, that we can expect to fall within a specified distance (as 
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measured in standard deviations) of the mean. The following is a statement of 
this theorem in probabilistic terms. 

Given the probability distribution of the random variable X with mean /u and 
standard deviation a, the probability of observing a value of X within k standard 
deviations of i± is at least 1 - \/k 2 . 



Alternatively, we may state Chebyshev’s theorem as follows: 

Given the probability distribution of the random variable X with mean p and 
standard deviation cr, the probability of observing a value of X that differs 
from p by k or more standard deviations cannot exceed 1 //c 2 . 

EXAMPLE 4.3.7 In a certain population, 60% are said to prefer a particular brand 
of toothpaste. We interview a random sample of 500 persons from this population. 
Within what interval would we expect the number of successes (persons who 
prefer the particular brand of toothpaste) out of these 500 trials to lie with a 
probability of 0.96? 

Since 1 - \/k 2 = 0.96 when k = 5, the probability is at least 0.96 that the 
number of successes we would observe is within five standard deviations of the 
mean. The probability that a given person prefers the brand of toothpaste is 0.6. 
And the number of successes out of 500 trials (interviews) is a random variable 
having a binomial distribution with n = 500 and p = 0.6. From Equations 4.3.2 
and 4.3.3, we find the mean and standard deviation to be 

pi = np = (500)(0.6) = 300 

and 

o- = Vnp( 1 - p) = V(500)(0.6)(0.4) = 10.95 

Since 5(10.95) = 54.75, the interval we want is 300 ± 54.75, or about 245 to 
355. 

Suppose that we find that only 240 out of the 500 prefer the brand of toothpaste. 
What conclusion might we draw from this? We might question the truth of the 
statement that 60% of the population prefers the brand, since 240 is more than 5 
standard deviations from the mean. Chebyshev’s theorem tells us that the proba¬ 
bility of this occurring is equal to \/k 2 = 1/5 2 = 0.04 or less. 

Chebyshev’s theorem applies to all random variables. However, it may provide 
weak information for the specific variable of interest. For many random variables, 
the probability of observing a value within two standard deviations of the mean 
is far greater than 1 - 1/2 2 = 0.75. We have devoted so much time to the 
theorem in order to shed some light on the nature and importance of the standard 
deviation as a measure of dispersion. 


(In each of the following exercises, assume that N is sufficiently large relative to n, and 
use the binomial distribution to find the desired probabilities.) 

4.3.1 Over a long period of time, a salesperson has found that the probability of making 
a sale when calling on a customer is 0.5. If this salesperson calls on 5 customers on a 
given day, find the probability of making (a) exactly 3 sales, (b) 3 or more sales, (c) 
fewer than 3 sales, (d) no sales, (e) 5 sales. 

4.3.2 For a certain group of people, it is estimated that 40% use a particular type of credit 
card primarily for installment buying. Suppose that this estimate is correct, and 25 persons 
picked at random from this group are questioned on the matter. What is the probability 



that the number using their credit cards in this manner is (a) 5 or fewer? (b) between 10 
and 15, inclusive? (c) 15 or more? 

4.3.3 Given n = 6, p 0.2, find (a) P(X > 3), (b) P{X < 4), (c) P( 2 < X < 4). 

4.3.4 Given n = 15, p = 0.3, determine (a) P(X > 10), (b) P(X < 5), (c) P(X > 12), 
(d) P{1 <X< 10). 

4.3.5 Refer to Exercise 4.3.2. Suppose that we review the records of 1000 holders of the 
credit card and find that 500 of them use their cards primarily for installment buying. Is 
this sufficient evidence to indicate that the estimate of the true number using their credit 
cards primarily for installment buying should be revised? [Hint: Use Chebyshev’s 
theorem.] 

4.3.6 In a certain town, 35% of the residents are opposed to the widening of Main Street. 
In a simple random sample of 20 residents, what is the probability that the number opposed 
to the widening is (a) more than 10? (b) between 15 and 18, inclusive? (c) fewer than 8? 
(d) at least 12? (e) no more than 13? 

4.3.7 Suppose that 72% of a certain population of drivers regularly use scat belts. You 
take a simple random sample of 15 of these drivers. What is the probability that the number 
regularly using seat belts is (a) more than 10? (b) fewer than 8? (c) at least 11? (d) 7 or 
more? 


4.4 THE POISSON DISTRIBUTION 


A second important discrete distribution is the Poisson distribution. It is named 
for the French mathematician Simeon Denis Poisson (1781-1840), who published 
its derivation in 1837 [Haight (1967)]. Although the distribution bears Poisson’s 
name, some writers, including David (1962) and Newbold (1927), credit its dis¬ 
covery to deMoivre. Haight (1967) credits the Russian mathematician Ladislaus 
von Bortkiewicz with first recognizing its statistical importance. Bortkiewicz ap¬ 
plied the Poisson distribution to the study of deaths of Prussian soldiers from 
horses’ kicks. 

The Poisson distribution is given by 



This formula represents the probability that a discrete variable X assumes the value 
x. That is, f(x) = P(X = x). The Greek letter X (lambda) is called the parameter 
of the distribution. It is the average number of occurrences of a random event in 
an interval of time or space. The number of occurrences of the random event in 
the interval is indicated by x. The symbol e is the constant (to four decimals) 
2.7183. 

The Poisson distribution has been extensively tabulated. Table B gives the 
cumulative Poisson distribution for various values of X. This table gives the prob¬ 
ability that X is less than or equal to some specified value x 0 for a distribution 
with a given X. It can be shown that fix) ^ 0 for every x, and that X/(jc) = 1. 
Thus the distribution satisfies the requirements for a probability distribution. 


The Poisson 
Process 


FIGURE 4.4.1 
Poisson 

distribution for 
selected values 
of X 


Figure 4.4.1 shows the form of the Poisson distribution for selected values 
of X. 

The Poisson distribution applies in many areas. Here are some examples. 

1. The demand for a product 

2. Typographical errors in a book 

3. The occurrence of accidents in a factory 

4. The pattern of arrival of store customers at a check-out counter 

5. The occurrence of flaws in a bolt of fabric 

6. The emission of radioactive particles 

7. The arrival of calls at a switchboard 

Random occurrences of some event that follow the Poisson distribution are said 
to be brought about by the Poisson process, whose characteristics are: 

1. The occurrences of the events are independent. That is, the occurrence of an 
event in an interval of space or time has no effect on the probability of a second 
occurrence of the event in the same, or any other, interval. 

2. Theoretically, an infinite number of occurrences of the event must be possible 
in the interval. 



JL 




3. The probability of a single occurrence of the event in a given interval is 
proportional to the length of the interval. 

4. In any infinitesimally small portion of the interval, the probability of more 
than one occurrence of the event is negligible. 

When the conditions of a Poisson process are reasonably satisfied, we can view 
the determination of whether or not an event has occurred as a Bernoulli trial. 
For example, suppose that we want to use the Poisson model to study the demand 
for a certain item sold in a retail store. We observe the store’s sales for a short 
period of time. And we assume that there can be no more than one call for the 
item during this time period. In each observed time period, there will be either a 
demand (success) or no demand (failure) for the item. 

An interesting characteristic of the Poisson distribution is the fact that the mean 
and variance are both equal to X. 


EXAMPLE 4.4.1 A long-term study of accidents in a shoe factory led management 
to conclude that the number of accidents per person during a year (X) are distrib¬ 
uted according to the Poisson law. The average number of accidents per person 
per year was 0.3. What is the probability that a randomly selected employee will 
not have an accident during the coming year? 

We need to evaluate Equation 4.4.1 for X = 0. That is, we need to evaluate 


P(X = 0) = f(0) 


c-° 3 0.3° 

0 ! 


If we enter Table B with X = 0 and X = 0.3, we find the desired probability to 
be 0.741. 

What is the probability that a randomly selected employee will have at least 
one accident during the coming year? 

What we need here is the complement of the probability just obtained. That is, 

P(X > 1) = 1 - P(X = 0) = 0.259 

What is the probability that an employee will have exactly one accident? 

To answer this question, we find 

P(X = 1) = P(X < 1) - P(X = 0) = 0.963 - 0.741 = 0.222 


EXAMPLE 4.4.2 Assume that the number of cars arriving at a certain freeway 
entrance ramp is Poisson distributed. If the assumption is correct, and if the 
average number of cars arriving during an hour is 5, what is the probability that 
in a given hour no cars will arrive at the ramp? 

To answer this question, we need to evaluate 


m = 


e~ s 5° 

0 ! 


We enter Table B with X = 0 and X = 5, and find the probability to be 0.007. 


What is the probability that exactly 5 cars will arrive in an hour? We use Table 
B and proceed as follows: 

P(X = 5) = P(X < 5) - P(X < 4) = 0.616 - 0.440 - 0.176 

What is the probability that more than 5 cars will arrive in an hour? Again we 
use Table B to find 

P(X> 5) = 1 - P(X < 5) = 1 - 0.616 = 0.384 


The Poisson 
Distribution as an 
Approximation to 
the Binomial 
Distribution 


This expression shows that when the Poisson distribution is used to approximate 
the binomial distribution, the value of X is taken to be np. What is meant by large 
n and small p is not precise. A generally accepted rule of thumb is that the 
approximation may be used if n > 20 and p < 0.05. However, this is only a 
rough guideline. Good approximations have been obtained for smaller n and larger 
p values. 

EXAMPLE 4.4.3 Suppose that the probability that a certain type of seed does not 
germinate is 0.04. If 25 of these seeds are planted, what is the probability that 5 
or less do not germinate? 

If we use the binomial distribution, we find that when n = 25 and p = 0.04, 

P(X < 5) = 0.9996 

Now let us use the Poisson distribution. We let X = np = (25)(0.04) = I. 
We then find from Table B that 

P(X < 5) = 0.999 

EXAMPLE 4.4.4 Consider Example 4.3.3, in which p — 0.08 = the proportion of 
insurance claims that are for burglary damages. The insurance company wants to 
determine the probability that of a random sample of 20 claims, 5 or more are 
for burglary damages. 

Using the binomial distribution, we found an answer of 0.0183. Now we let 
X = np = (0.08)(20) = 1.6 and use the Poisson distribution. We find from Table 
B that 

P(X > 5) = 0.024 

Thus we see that with p > 0.05, the results of the two methods are close. They 
are the same when the answers are rounded to two decimal places. 


We can use the Poisson distribution to approximate the binomial distribution when 
n is large and p is small. When these conditions are met, we can express the 
relationship as follows: 


q n x p' 


e~ np {np) x 


(4.4.2) 


x! 
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4.4.1 The number of defects per square foot of a certain manufactured fabric is Poisson 
distributed with X = 0.08. If a square foot of fabric is inspected, what is the probability 
that the number of defects observed is (a) zero? (b) at least 1? (c) exactly 2? 

4.4.2 Suppose that the number of persons per car arriving at the entrance to an amusement 
park is Poisson distributed with X = 2.4. What is the probability that a car arriving at the 
entrance contains (a) no persons? (b) only 1 person? (c) more than 1 person? (d) 8 persons? 
(e) more than 8 persons? 

4.4.3 Of the items produced in a certain factory, 3% are defective. A sample of 25 items 
is selected for inspection. Use both the binomial and the Poisson distributions to answer 
the following questions, and compare the results, (a) What is the probability that exactly 

4 defectives are found? (b) What is the probability that 3 or more defectives are found? 

4.4.4 In a certain resort area, the number of vacant motel rooms follows a Poisson dis¬ 
tribution. The expected vacancy rate is 10 rooms per night. Find the probability that on a 
given night the number of vacant rooms will be: (a) none, (b) 1, (c) 7, (d) 6 or more, (e) 

5 or fewer, (f) between 5 and 12, inclusive, (g) more than 12. 

4.4.5 In a certain population, 6% suffer from emotional disorders. In a simple random 
sample of 60, what is the probability that the number suffering from emotional disorders 
will be (a) more than 6? (b) at least 10? (c) between 5 and 15, inclusive? 

4.4.6 In a certain population of executives, 55% are overweight. In a simple random 
sample of 40 of these executives, what is the probability that the number who are over¬ 
weight will be (a) more than 25? (b) at least 30? (c) 35 or more? 


4.5 THE HYPERGEOMETRIC DISTRIBUTION 

In Section 4.3 we learned that if analysis based on the binomial model is to be 
valid, the probability of a success p must remain constant throughout the sampling 
operation. That is, p must be the same each time we select an element from the 
population. As we noted, this condition is met when sampling is from an infinite 
population. If the sampled population is finite, p will not be the same on successive 
drawings of elements if sampling is without replacement. When sampling is with¬ 
out replacement, the population is decreased by 1 each time we select an element. 

For example, suppose that we have a population of 27 people, of whom 15 are 
males and 12 are females. Suppose further that we wish to draw a random sample 
of size 5 from this population. Before we select the first person, the probability 
that this person will be a female (a success) is p — 12/27. After we select the 
first person but before we select the second person, the probability that the person 
we select will be a female is either 12/26 or 11/26, neither of which is equal to 
12/27. The probability that the second person we select will be female is p -- 
12/26 if the first person we select is male and 11/26 if the first person we select 
is female. Figure 4.5.1 shows sampling without replacement from a population 
of 15 males and 12 females. 

As noted in Section 4.3, when we sample from very large finite populations, 
we usually need not be concerned with the fact that the binomial model is not 


X 




strictly valid. If the sample contains only a small proportion of the finite popu¬ 
lation, say 10% or less, the binomial model usually yields results that closely 
approximate those that we would realize if the population were infinite. 

We may have a problem if we try to use the binomial model with a very small 
population or with a sample that contains a large proportion of the population. 
One solution would be to sample with replacement. That is, we could return each 
element to the population after we drew and examined it. This procedure would 
ensure a constant value of p, the proportion of successes, throughout the sampling 
operation. It would be equivalent to sampling from an infinite population. In 
situations of practical importance, however, sampling with replacement is difficult 
to justify. Consequently we conduct most sampling operations without replace¬ 
ment. 

Fortunately we have another model available for many of the situations in which 
we cannot use the binomial model. We can use this model, known as the hyper- 
geometric model, when we need to sample without replacement from small pop¬ 
ulations. In fact, the hypergeo metric model is appropriate any time we are sam¬ 
pling without replacement from a finite population and wish to determine the 
probability of a specified number of successes and/or failures. This example 
illustrates the use of the hypergeometric model. 

EXAMPLE 4.5.1 A carton of 6 flashlight batteries contains 2 that are defective (D) 
and 4 that are nondefective (N). If we select 3 batteries at random from the carton, 
what is the probability that the sample contains exactly 1 defective battery? 

The tree diagram in Figure 4.5.2 shows the ways in which we can obtain 1 
defective item when we sample from a population of size 6 that contains 2 de¬ 
fective and 4 nondefective items. Not all possible samples of size 3 will contain 
exactly 1 defective. The heavy branches in Figure 4.5.2 represent the samples 
with exactly 1 defective. Also the figure shows the composition of the carton 
before each draw. It shows the probability associated with each outcome in pa¬ 
rentheses on the branch representing the outcome. 

There are three ways to obtain a sample with exactly one defective. Each has 
the same probability, 0.200, of occurring. A single sample drawn from the pop¬ 
ulation may be the one shown as DNN, the one shown as NDN, or the one shown 
as NND. By the addition rule, the probability of obtaining one of these three 
outcomes is 0.200 + 0.200 + 0.200 = 0.600, since the outcomes are mutually 
exclusive. 

We can describe the type of sampling situation illustrated in Example 4.5.1 and 
Figure 4.5.2 in general terms as follows: 

From a population of size N consisting of A/ 1 elements of one type (successes) 
and N 2 elements of another type (failures), select at random and without re¬ 
placement a sample of size n. Determine the probability P(X = x) that the 
sample will contain x successes. 

The formula for P(x) under these conditions is 
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To illustrate the use of Equation 4.5.1, let us refer again to Example 4.5.1, in 
which Nx = 2, N 2 = 4, iV = 2 + 4 = 6, n = 3, and* = 1. Proper substitution 
into the formula gives 



This result agrees with that obtained earlier. 


Mean and 
Variance of the 
Hypergeometric 
Distribution 


The mean and variance of the hypergeometric distribution are, respectively, 


and 


fi = E(X) = n 




(4.5.2) 


(4.5.3) 


Exercises 



4.5.1 A population consists of 8 persons, of whom 3 are married and 5 are single. Suppose 
that we draw a sample of size 4 without replacement from this population. What is the 
probability that exactly 2 members of the sample will be married? 

4.5.2 A population consists of 7 people, of whom 3 drive foreign cars and 4 drive Amer¬ 
ican-made cars. Suppose that we draw a sample of size 3 from the population. What is 
the probability that the sample will contain exactly 1 person who drives a foreign car? 
What is the probability that the sample will contain at least 1 person who drives a foreign 
car? 

4.5.3 A box contains 10 two-inch screws, of which 4 have a Phillips head and 6 have a 
regular head. Suppose that we select 4 screws randomly, without replacement, from the 
box. What is the probability that the number of Phillips-head screws in the sample will be 
(a) exactly 4? (b) 2 or 3? (c) more than 1? 

4.5.4 A firm employs 6 word-processor operators, of whom 3 are women. You choose 4 
from the 6, at random. What is the probability that of those chosen, the number of women 
will be (a) exactly 2? (b) 2 or more? (c) at least 2? 

4.5.5 A population consists of 9 junior executives, of whom 5 have a master’s degree. 
You select a random sample of 6 from this population. What is the probability that the 
number with a master’s degree will be (a) exactly 3? (b) between 2 and 4, inclusive? (c) 
no more than 3? 


4.6 PROBABILITY DISTRIBUTIONS OF CONTINUOUS 
RANDOM VARIABLES 

The binomial, Poisson, and hypergeometric distributions are distributions of dis¬ 
crete random variables. In this section we consider the general idea of the distri¬ 
bution of a continuous random variable. In the following sections we discuss in 
detail the most important special distribution of a continuous random variable— 
the normal distribution. The random variable X is continuous if it can assume all 
possible values between any two particular values x a and x b . To help understand 
the nature of the distribution of a continuous random variable, consider the fol¬ 
lowing example. 

EXAMPLE 4.6.1 Table 4.6.1 shows the frequency and relative frequency distribu¬ 
tions of the lengths of a sample of 200 aluminum-coated steel sheets taken from 
a production lot of a certain factory. Since the probability associated with each 
interval of lengths shown in the first column is given by the relative frequency 
column, this column constitutes the probability distribution of the random variable, 
length of aluminum-coated sheet. 


TABLE 4.6.1 
Distribution of 

Length (inches) 

Frequency 

Relative frequency 

lengths of 200 

30.000-30.124 

8 

0.04 

aluminum-coated 

30.125-30.249 

20 

0.10 

steel sheets 

30.250-30.374 

32 

0.16 

30.375-30.499 

40 

0.20 


30.500-30.624 

36 

0.18 


30.625-30.749 

34 

0.17 


30.750-30.874 

20 

0.10 


30.875-30.999 

10 

0.05 


Total 

200 

1.00 


We can present the probability distribution as a histogram, as in Figure 4.6.1, 
or as a frequency polygon, as in Figure 4.6.2. The area within each cell of the 
histogram represents a certain proportion of the total area bounded by the histo¬ 
gram and the horizontal axis. The proportion of the total area contained within a 
particular cell is equal to the probability of observing a value between the bound¬ 
aries of that cell. For example, Table 4.6.1 shows that the relative frequency, or 
probability, of occurrence of values between 30.000 and 30.125 inches is 0.04. 
The corresponding cell in Figure 4.6.1 has 4% of the total area of the histogram. 

Given any histogram of a probability distribution, we can find the probability 
of occurrence of values between any two points on the horizontal axis. We do 
this by determining what proportion of the total area we enclose when we erect 
vertical lines at these points. For example, to find the probability of occurrence 
of values between 30.500 and 30.625 in Figure 4.6.1, we determine what pro¬ 
portion of the histogram’s area is enclosed by vertical lines erected at these points. 
The values 30.500 and 30.625 define a class interval. Since the vertical lines 
erected at these points define a cell of the histogram, we consult Table 4.6.1, and 
we find the proportion of area enclosed to be 0.18. The probability of occurrence 
of values between 30.500 and 30.625, then, is 0.18. 


FIGURE 4.6.1 
Histogram of 
lengths of 200 
aluminum-coated 
steel sheets 




FIGURE 4.6.2 
Relative frequency 
polygon of lengths 
of 200 aluminum- 
coated sheets 



To find the area enclosed by two or more cells, we must find the sum of the 
individual areas. This total is equal to the probability associated with the corre¬ 
sponding class intervals. For example, to find the area between 30.500 and 30.750 
in Figure 4.6.1, we add the areas of the two cells involved. From Table 4.6.1 we 
find this area to be 0.18 + 0.17 = 0.35. Thus, the probability of occurrence of 
values between 30.500 and 30.750 is 0.35. 

Now consider a situation in which the number of values of the random variable 
is very large and the width of the class intervals is very small. For example, 
suppose that we have 2000 aluminum-coated steel sheets instead of 200. And 
suppose that we prepare a histogram for these data using much smaller class 
intervals, perhaps 0.010 inch in width. The resulting histogram might look like 
the one in Figure 4.6.3. 

Suppose that we were to construct a frequency polygon from the histogram of 
Figure 4.6.3. The figure would be much smoother than the one in Figure 4.6.2. 


FIGURE 4.6.3 
Histogram of 
lengths of 2000 
aluminum-coated 
steel sheets (much 
smaller class 
intervals than in 
Figure 4.6.1) 






FIGURE 4.6.4 
Smooth curve 
approximating a 
histogram for data 
with large n and a 
large number of 
class intervals 



In fact, as the number of values n approaches infinity, and the width of the class 
intervals approaches 0, the frequency polygon approaches a smooth curve, such 
as the one shown in Figure 4.6.4. We can use smooth curves as a graphical means 
of representing probability distributions of continuous random variables. 

As with the histogram, the total area under a smooth curve used to represent a 
probability distribution is equal to 1. The probability of occurrence of values 
between any two points on the horizontal axis is equal to the area bounded by 
perpendicular lines erected at these points, the curve itself, and the horizontal 
axis. Figure 4.6.5 shows the graph of the probability distribution of a continuous 
random variable with the area between two points a and b shaded. This area is 
equal to the probability of occurrence of values between a and b. 

We can describe the curve used to represent the probability distribution of a 
continuous random variable, such as that shown in Figure 4.6.5, by a probability 
density function . A probability density function is a formula, or equation, used to 
represent the probability distribution of a continuous random variable. We need 
to represent the distribution of a continuous random variable in this manner. 
Because of the variable’s continuity, we cannot list its possible values along with 
the associated probabilities as we can for a discrete random variable. 

We cannot find areas under a smooth curve the same way we find areas under 
a histogram. A smooth curve does not have delineated subareas corresponding to 
the cells of a histogram. We use integral calculus to find subareas under smooth 
curves. That is, to find the area between points a and b in Figure 4.6.5, we 
integrate the probability density function using a and b as the limits of integration. 
Methods of integral calculus are beyond the scope of this text. However, this is 


FIGURE 4.6.5 
Probability 
distribution of the 
continuous random 
variable X showing 
P(a ^ x b) 




not a serious problem, since tables of values obtained by integration are available 
for those probability density functions of interest to us. 

In the preceding discussion, we have implied the definition of a probability 
distribution of a continuous random variable. In a more compact form, it is as 
follows: 

A function f(x), where f(x) > 0, is called a probability distribution (or proba¬ 
bility density function) of the continuous random variable X if the total area 
bounded by its curve and the x axis is equal to 1, and if the subarea under the 
curve bounded by the curve, the x axis, and perpendiculars erected at any two 
points a and b gives the probability that X is between points a and b. 


4.7 THE NORMAL DISTRIBUTION 

We now come to the most important distribution in statistics—the normal distri¬ 
bution. The formula for this distribution was first published by Abraham deMoivre 
(1667-1754) in 1733. [See Pearson (1924) and Walker (1933-34).] Other math¬ 
ematicians linked with the history of the normal distribution are Pierre Simon, 
Marquis de Laplace (1667-1754) and Carl Friedrich Gauss (1777-1855), in whose 
honor it is sometimes called the Gaussian distribution. 

The normal density function is given by 

f(x) = - jL= e -(x-n) 2 /2cfi- _ oo < x < oo (4.7.1) 

V2t7(7 

where 77 and e are the familiar constants 3.14159 and 2.71828, respectively. The 
distribution has two parameters: /x, the mean, and cr, the standard deviation. The 
graph of the normal distribution is the familiar bell-shaped curve shown in Figure 
4.7.1. 

The following are some important characteristics of the normal distribution: 

1. It is symmetrical about its mean (i. As seen in Figure 4.7.1, the curve on 
either side of /x is a mirror image of the other side. 

2. The mean, the median, and the mode are all equal. 


3. The total area under the curve above the x axis is equal to 1. Because of the 
symmetry of the normal curve, 50% of the area is to the right of a perpendicular 
line erected at the mean, and 50% is to the left. 

4. Suppose that we erect vertical lines one standard deviation from the mean in 
each direction. The area enclosed by these lines, the v axis, and the curve will be 
equal to approximately 68% of the total area. If we erect these lateral boundaries 
two standard deviations from the mean in each direction, they will enclose ap¬ 
proximately 95% of the area. Perpendiculars erected three standard deviations on 
either side of the mean will enclose approximately 99.7% of the total area. Figure 
4.7.2 illustrates these approximate areas. 

A normally distributed random variable is an example of a case in which Che¬ 
byshev’s theorem provides weak information regarding the probability of observ¬ 
ing a value within specified distances of the mean. Instead of the probabilities of 
0.95 and 0.997, Chebyshev’s theorem leads to probabilities of at least 1 - 1/2 2 
= 0.75 and at least 1 - 1/3 2 = 0.899, respectively, of observing a value within 
two and three standard deviations of the mean. Chebyshev’s theorem gives no 
information at all about the probability of observing a value within one standard 
deviation of the mean, since 1 — 1 / k 2 = 0 when k = 1. Thus, if we know that 
a random variable is normally distributed, we can make more powerful probability 
statements than we could using Chebyshev’s theorem. 


FIGURE 4.7.2 
Normal 
distributions, 
showing areas 
bounded by 
perpendiculars 
erected a distance 
of one, two, and 
three standard 
deviations on 
either side of the 
mean (areas are 
approximate) 





FIGURE 4.7.3 
Three normal 
distributions with 
different means 



The Standard 

Normal 

Distribution 


5. The normal distribution is completely determined by its parameters /x and or. 
That is, each different value of fx or cr specifies a different normal distribution. 
Figure 4.7.3 shows how different values of /x cause the graph of the distribution 
to be shifted along the * axis. Figure 4.7.4 shows how different values of the 
standard deviation cr, which is a measure of dispersion, determine the flatness or 
peakedness of the graph of the distribution. 

The normal distribution is really a family of distributions in which one member 
is distinguished from another on the basis of the values of /x and cr. In other 
words, as already indicated, there is a different normal distribution for each dif¬ 
ferent value of either fi or cr. 

The most important member of this family of distributions is the one that has 
a mean of 0 and a standard deviation of 1. This distribution is called the standard 
normal distribution. We can obtain it from Equation 4.7.1 by letting /x = 0 and 
cr - 1. We usually use the letter z for the random variable that results. Conse¬ 
quently the equation for the standard normal distribution is written 

f(z) = e ~ z2/2 -oo < z < oo (4.7.2) 

V2tt 

Figure 4.7.5 shows the graph of the standard normal distribution. 


FIGURE 4.7.4 
Three normal 
distributions with 
different standard 
deviations 



FSGURE 4.7.5 
The standard 
norma! distribution 



The probability that z lies between any two points on the z axis, say z 0 and z lf 
is determined by the area bounded by perpendiculars erected at each of these 
points, the curve, and the horizontal axis. We find areas under the curve of a 
continuous distribution by integrating the function between two values of the 
variable. Thus, to find the area between z 0 and z x of the standard normal distri¬ 
bution, we must evaluate the following integral: 



There are tables that give the results of integrations in which we might be inter¬ 
ested. Therefore we do not need to perform the integration. Table C of the Ap¬ 
pendix, for example, gives the areas under the standard normal curve between 
— sc and the values of z shown in the marginal column of the table. The shaded 
area in Figure 4.7.6 represents the area listed in the body of Table C as being 
between — <» and z = z 0 . The following examples illustrate the use of Table C. 

EXAMPLE 4.7.1 Given the standard normal distribution, find the area under the 
curve above the z axis between z = — °o and z = 2.5. 

The area is shaded in Figure 4.7.7. Locating z = 2.5 in Table C and reading 
the corresponding entry in the body of the table, we find the desired area to be 
0.9938. We can interpret this in several ways. It is the probability that a z picked 
at random from the population of z’s will have a value between — oo and 2.5. It 
is also the relative frequency of occurrence (or proportion) of values of z between 


FIGURE 4.7.6 
Standard normal 
distribution 
showing area 
between - ^ and 
z o 




FIGURE 4.7.7 
Standard normal 
distribution 
showing area 
between — » and z 
= 2.5 


FIGURE 4.7.8 
Standard normal 
distribution 
showing area 
between z = 
-2.65 and z = 

+ 2.65 



-co and 2.5. Or we can say that 99.38 percent of the z’s have a value between 
-oo and 2.5. 


EXAMPLE 4.7.2 What is the probability that a z picked at random from the popu¬ 
lation of z’s will have a value between —2.65 and +2.65? 

Figure 4.7.8 shows the desired area. We find the area between -oo and 2.65 
by locating 2.6 in the far left column of Table C, then moving across to the entry 
in the column headed by 0.05. The area at the intersection of 2.6 and 0.05 is 
0.9960. Similarly we find the area from — oc to -2.65 to be 0.0040. We find the 
area we want by subtracting 0.0040 from 0.9960. That is, 

P(-2.65 < z < 2.65) = 0.9960 - 0.0040 = 0.9920 

EXAMPLE 4.7.3 What proportion of z values are between -2.78 and 1.47? 

Figure 4.7.9 shows the area desired. The area between -oo and 1.47 is 0.9292. 
The area between -oo and -2.78 is 0.0027. To find the desired area, we subtract 
0.0027 from 0.9292 to obtain 0.9265. That is, 

PC —2.78 < z < 1.47) = 0.9292 - 0.0027 = 0.9265 

EXAMPLE 4.7.4 Given the standard normal distribution, find P(z > 1.73). 

Figure 4.7.10 shows the area desired. We can obtain the area to the right of z 
= 1.73 by subtracting the area between -oo and 1.73 from 1. Thus 

P(z > 1.73) = 1 - P(-oc < z < 1.73) = 1 - 0.9582 = 0.0418 







FIGURE 4.7.9 
Standard normal 
distribution 
showing area 
between z = 
-2.78 and z = 

+ 1.47 


Exercises 


FIGURE 4.7.10 
Standard normal 
distribution 
showing area to 
right of z = 1.73 



Again, the probability that z is between two values z a and z h is equal to the 
area under the curve between perpendicular lines erected at z a and z b . The area 
above a point, say z a , is equal to zero. Thus the probability that z = z a = 0. 
That is, P(z = z a ) = 0. Therefore, the probability that z is greater than or equal 
to z a is the same as the probability that z is greater than z a . Using symbols, we 
can write P(z > z a ) = P{z > z a ). For example, Table C tells us that P(z ^ 1.5) 
= 0.0668. Since P(z = 1.5) = 0, P(z > 1.5) = 0.0668, also. Similarly, P(z a 
< z < z h ) = P(z a < z < z h ) and P{z a < z) = P(z a < z). 

Given the standard normal distribution, find: 

4.7.1 The area under the curve between z = 0 and z = 1.54. 

4.7.2 The probability that a z picked at random has a value between z — —2.07 and 
z = 2.33. 

4.7.3 P(z > 0.65) 

4.7.4 P(z > -0.65) 

4.7.5 P(z < -2.33) 

4.7.6 P(z < 2.33) 

4.7.7 P(-1.96 < z < 1.96) 

4.7.8 P(-2.58 < z < 2.58) 

4.7.9 P(-3.10 < z < 1.25) 

4.7.10 P(1.47 ^ z < 3.44) 

Given the following probabilities, find z,: 

4.7.11 P{z < zj) = 0.0055 




Applications of 
the Normal 
Distribution 


4.7.12 P{ — 2.67 < 2 < z,) = 0.9718 

4.7.13 P(z > z,) = 0.0384 

4.7.14 />(z, < z < 2.98) = 0.1117 

4.7.15 F( — z, < z < Zj) = 0.8132 


The normal distribution is very important in statistical inference. We should re¬ 
alize, however, that it is not a natural law that we encounter each time we analyze 
a continuous random variable. The normal distribution is a theoretical, or ideal, 
distribution. No set of measurements conforms exactly to its specifications. Many 
sets of measurements, however, are approximately normally distributed. In such 
cases the normal distribution is quite useful when we try to answer practical 
questions regarding these data. 

In particular, whenever a set of measurements is approximately normally dis¬ 
tributed, we can find the probability of occurrence of values within any specific 
interval, just as we can with the standard normal distribution. We can do this 
because we can easily transform any normal distribution with a known mean p 
and standard deviation a to the standard normal distribution. Once we have made 
this transformation, we can use a table of standard normal areas, such as Table 
C, to find relevant probabilities. 

We can transform a normal distribution to the standard normal distribution using 
the formula 


x — p 

Z = -— (4.7.3) 

<T 

This transforms any value of a in the original distribution to the corresponding 
value of z in the standard normal distribution. 

Suppose, for example, that we have a population of measurements that are 
approximately normally distributed with a mean of 10 and a standard deviation 
of 2.5. Suppose, also, that we want to find the probability that a measurement 
selected at random from this population will have a value equal to or greater than 
15. We first transform a = 15 to its corresponding z value. That is, 

a - p = 15 - 10 U _5_ = ? 

Z a 2.5 2.5 

Figure 4.7.11 shows the relationship between the original distribution and the 
standard normal distribution, with the area of interest shaded. The figure shows 
that the distance from the mean, 10, to the value of interest, 15, is 15 - 10 = 
5. This is a distance of 2 standard deviations. When a values are transformed to 
z values, the distance of a z value from its mean, 0, is equal to the distance of 
the corresponding a value from its mean in standard deviation units. In the present 
example, a is 2 standard deviations from its mean. In the z distribution, a standard 
deviation is equal to 1. Therefore the point on the z scale located 2 standard 
deviations from 0 is z = 2. This is the same result that we obtained using the 





FIGURE 4.7.11 
Original 

distribution of X 
(approximately 
normal) and 
corresponding 
standard normal 
distribution. Note 
that the standard 
normal distribution 
is “skinnier" than 
the original 
distribution 
because cr = 1 is 
smaller than cr = 
2.5. 



formula. From Table C, the area to the right of z = 2 is 0.0228. We can sum¬ 
marize this discussion as follows: 

P(X > 15) = > 1 5 l0 j = P(z 2 2) = 0.0228 

In Figure 4.7.11, the distribution of z is “skinnier” than the distribution of jc. 
This is because the distribution of z has a smaller standard deviation (1) than does 
the distribution of x, which has a standard deviation of 2.5. Subsequent pictures 
of the standard normal distribution will not be drawn to scale. 


EXAMPLE 4.7.5 Refer to Example 4.6.1. Suppose that the population of lengths of 
aluminum-coated steel sheets is approximately normally distributed with a mean 
of fx = 30.5 inches and a standard deviation of a — 0.2 inch. What is the 
probability that a sheet selected at random from the population is between 30.250 
and 30.750 inches long? 

First we transform each value of x to the corresponding value of z. Thus, 


and 


30.250 - 30.500 _ -0.250 
’ ~ 0.2 ~ 0.2 


-1.25 


30.750 - 30.500 0.250 

^2 = 


0.2 


0.2 


1.25 




FIGURE 4.7.12 
Normal distribution 
and corresponding 

standard normal Figure 4.7.12 shows the relationship between the original distribution and the 
distribution, standard normal distribution. The areas of interest are shaded. The probability we 

Example 4.7.5 seek is 

P{ 30.250 < X < 30.750) = P(- 1.25 < z < 1.25) 

= P(z < 1.25) - P(z < -1.25) 

Table C shows this to be 0.8944 - 0.1056 = 0.7888. 

The Normal The normal distribution gives a good approximation to the binomial distribution 

Approximation to when n is large and p is not too close to 0 or 1. This enables us to calculate 

the Binomial probabilities for large binomial populations for which binomial tables are not 

available. A good rule of thumb is that the normal approximation to the binomial 
is appropriate when np and n( 1 - p) are b oth greater than 5. To use the normal 
approximation, we use p = np and cr = Vn/?(1 - p)> We convert values of the 
original variable to values of z to find the probabilities of interest. 

The normal distribution is continuous and the binomial is discrete. Therefore 
we can get better results if we make an adjustment to account for this when we 
use the approximation. The need for such an adjustment, called the continuity 
correction , is evident when we compare a histogram, constructed from binomial 
data, with a superimposed smooth curve. Figure 4.7.13 illustrates this situation 
for n - 20 and p = 0.3. 

In Figure 4.7.13, the probability that X — x is equal to the area of the rectangle 
centered at Jt. For example, the probability that X = 8 is equal to the area of the 
rectangle centered at 8. We can see that this rectangle extends from 7.5 to 8.5. 
In Table A, we find that this area is equal to 0.1144. The corresponding area is 
shaded in Figure 4.7.13a. 

When we use the normal approximation to the binomial distribution, we take 
into account the fact that for the binomial distribution, P(X = x) is the area of a 
rectangle centered at x. When we convert values of j to values of z, the continuity 
correction consists of adding 0.5 to, and/or subtracting 0.5 from, x as appropriate. 
To illustrate, let us use the continuity correction and normal approximation to find 
the probability thatX assumes a value between x a = 7.5 and = 8.5. Converting 
to z values, we have 









FIGURE 4.7.13 
Normal 

approximation to 
the binomial with 
n = 20, p = 0.3, 
and |x = np = 6, 
showing P(X = 8) 



Probability 

0.20 f- 

0.19 - 
0.18 - 
0 .17 - 3 | 
0.16 - 
0.15 - 5 

0.14 - 
0.13 - 
0 12 - 
0.11 - 
0.10 - 
0.09 - 
0.08 ~ \ 
0.07 - 
0.06 - 
0.05 ~ 

0.04 - 
0.03 - 
0.02 - 
0.01 - £ 


0.1144 


(a) P(X=8) using binomial probability 


Probability 
0.20 b 
0.19 
0.18 - 
0.17 ~ 
0.16 - 
0 .15 - 
0.14 
6.13 - 
0.12 - 
0.11 - 
0.10 - 
0.09 - 
0.08 - 
0.07 - 
0.06 - 
0.05 - 
0.04 - 
0.03 - 
0.02 - 
0.01 - 


0.1215 


..; 


. > . | 


(b) P(7.5<X<8.5) using normal approximation 



7.5 - 6 


z 


a 


Z 


b 


1.5 

V (20)(0 3)(0.7) 2.05 

8,5 - 6 _ 2.5 

V(20)(0.3)(0.7) ” 2.05 


0.73 

1.22 


From Table C, the probability we seek is 0.1215, which is reasonably close to 
the exact probability of 0.1144. The area under the normal curve corresponding 
to P{1.5 < X < 8.5) is shaded in Figure 4.7.13/?. 

Let us use this same example to find P(5 < X < 10). Using binomial proba¬ 
bilities from Table A, we fmd the answer to be 0.7454. The corresponding area 
is shaded in Figure 4.7A4a. 

To use the normal approximation, we find 


P( 4.5 < X < 10.5) 



10.5 - 6 \ 
2.05 / 


= P( —0.73 < z < 2.20) 


= 0.9861 - 0.2327 = 0.7534 


The corresponding area is shaded in Figure 4.7.14/?. 

Again we see that the normal approximation gives a result that is quite close 
to the exact probability. If we had not used the continuity correction, the normal 
approximation would have given 


P(5 < X < 10) 


/5 - 6 10 - 6 \ 

= P -< z <- 

V 2.05 2.05 j 

= P( —0.49 < z < 1.95) 

= 0.9744 - 0.3121 = 0.6623 


This approximation is not nearly as close to the true probability of 0.7454 as 
is the approximation obtained with the continuity correction. When n is large and 
p is not too close to 0 or 1, we usually omit the continuity correction when finding 
probabilities associated with such intervals as P(x a < X < x b ), P(X ^ x), or P{X 
>x). 


(In the following exercises, draw pictures to show areas and points of interest.) 

4 . 7.16 Given a normally distributed population of values with a mean of 76 and a standard 
deviation of 10: (a) What proportion of values are between 71 and 82? (b) What proportion 
are greater than 75? (c) What is the probability that a value picked at random from this 
population is less than 78? 

4 . 7.17 Suppose that the diameters of lids for tin cans produced by a certain manufacturer 
are normally distributed with a mean of 4 inches and a standard deviation of 0.012 inch. 
What proportion of the lids produced are between 3.97 inches and 4.03 inches? 

4 . 7.18 Given a normal distribution of values with a mean of 120 and a variance of 16, 
what proportion of the values are greater than 121? 










Probability 
0 . 20 - 
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(a) P(5 < X < 10) using binomial probability 

■ v-.V- ■ 

Probability 

0.20 - ^ 

008- I ' 

0.07 - ^ r—f = v;. JL 

-0.06 - . , / . • \ 

0.05 - 

0.04- / 

002 ~- m 

0.01 - y\ I : 




0.7534 


(b) P(4.5<X<10.5) using normal approximation 


FIGURE 4.7.14 
Normal 

approximation to 
the binomial with 
n - 20, p = 0.3, 
and fx — np — 6, 
showing P(5 < X 
< 10) 


Summary 





4.7.19 The weights of a certain melon are normally distributed with a mean of 14 ounces 
and a standard deviation of 1.22 ounces. What is the probability that a melon drawn at 
random from this population will weigh less than 12 ounces? 

4.7.20 A bank official finds that the lengths of time customers have to wait to be serviced 
by a teller are approximately normally distributed with a mean of 3 minutes and a standard 
deviation of 1 minute, (a) What proportion of customers have to wait longer than 2 minutes 
but less than 3| minutes? (b) What proportion of customers have to wait 1 minute or less? 
(c) You are about to enter the bank. What is the probability that you will have to wait 
longer than 5 minutes? 

4.7.21 A production supervisor found that employees, on the average, complete a certain 
task in 10 minutes. The times required to complete the task are approximately normally 
distributed, with a standard deviation of 3 minutes. Find the following: (a) The proportion 
of employees completing the task in less than 4 minutes, (b) The proportion of employees 
requiring more than 5 minutes to complete the task, (c) The probability that an employee 
who has just been assigned the task will complete it within 3 minutes. 

4.7.22 In a certain large firm, 30% of the employees are females. A random sample of 
50 is selected from this population. What is the probability that the number of females 
will be between 20 and 24, inclusive? 

4.7.23 Suppose that only 40% of the residents of a city favor a certain zoning petition. 
What is the probability that a random sample of 100 citizens will contain between 30 and 
50, inclusive, who favor the petition? 


This chapter introduced you to some important probability distributions. If you 
understand the basic concepts presented here, you will understand the ideas pre¬ 
sented in the later chapters more easily. Chapter 3, this chapter, and the next 
chapter serve as a bridge connecting the methods and concepts of descriptive 
statistics with those of inferential statistics. 

In this chapter you learned about discrete probability distributions and contin¬ 
uous probability distributions. You learned that the probability distribution of a 
discrete random variable is a table, graph, formula, or other device used to specify 
all possible values of a discrete random variable along with their respective 
probabilities. 

You learned that if a random variable X is continuous, we cannot speak mean¬ 
ingfully of the probability that X = jc. Thus the probability distribution of a 
continuous random variable must be defined differently. A nonnegative function 
fix) is called a probability distribution of the continuous random variable X if the 
total area bounded by its curve and the x axis is equal to 1, and if the subarea 
under the curve bounded by the curve, the x axis, and perpendiculars erected 
at any two points a and b gives the probability that X is between the points a 
and b. 

The discrete probability distributions covered in this chapter are the binomial, 
the Poisson, and the hypergeometric distributions. The binomial distribution pro¬ 
vides an appropriate model for analysis when the available data consist of n 
repetitions of a Bernoulli trial. A Bernoulli trial is a sequence of trials in which 
each trial can result in only one of two mutually exclusive outcomes. 

The Poisson distribution is a good model for analyzing a variety of business 
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Review Questions 
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problems, such as those concerned with product demand, the occurrence of ac¬ 
cidents in a factory, and arrival patterns of customers at various types of service 
counters. 

When we sample without replacement from a small population, the hypergeo¬ 
metric distribution often serves as the model for determining the probability of 
some predetermined number of successes. Using the hypergeometric formula to 
calculate probabilities is very tedious if the population is large. Fortunately, in 
most situations in which the population is large, the hypergeometric model is 
satisfactorily approximated by the binomial distribution. 

The only continuous probability distribution covered in this chapter is the nor¬ 
mal distribution. This is the most important distribution we encounter in the study 
and practice of statistics. We shall use it often in later chapters. 


1. What is a discrete random variable? Give three examples from the field of business. 

2. What is a continuous random variable? Give three examples from the field of business. 

3. Define the probability distribution of a discrete random variable. 

4. Define the probability distribution of a continuous random variable. 

5. What is a cumulative probability distribution? 

6. What is a Bernoulli trial? 

7. Describe the binomial distribution. 

8. Give an example of a random variable that you think follows a binomial distribution. 

9. Describe the Poisson distribution. 

10. Give an example of a random variable that you think is distributed according to the 
Poisson law. 

11. Describe the normal distribution. 

12. Describe the standard normal distribution and tell how it is used in statistics. 

13. Give an example of a random variable that you think is at least approximately normally 
distributed. 

14. Using the example you gave in Question 13, demonstrate the use of the standard 
normal distribution in answering probability questions relating to this variable. 

15. State Chebyshev’s theorem and explain how it may be used. 

16. Discuss the concept of expected value. 

17. The following is the frequency distribution of the number of times each of 100 ma¬ 
chines in a factory broke down during the past year: (a) Construct the probability distri¬ 
bution of the random variable X = number of machine breakdowns, (b) Draw a graph of 
the distribution, (c) Construct and graph the cumulative probability distribution of X. 


x, (Number of 

breakdowns) 01 23456789 10 

Frequency of 

occurrence of x, 10 15 20 25 15 5 3 2 2 2 1 

18. In Exercise 17, find: (a) the probability that a machine picked at random is one that 
broke down 3 times; (b) the probability that a randomly selected machine is one that broke 




down 4 or fewer times; (c) the probability that a machine picked at random is one that 
broke down between 4 and 7 times, inclusive; (d) the probability that a randomly selected 
machine is one that broke down either 1 or 2 times; (e) P(X > 5); (f) P(X < 1); 
(g) P(X > 5); (h) P(0 < X < 3). 

19. Customers entering a store have 5 brands (equal in price and weight) of a certain 
product from which to choose. Say that 30% of the customers prefer Brand A. What is 
the probability that of 20 customers making a selection from the available brands. Brand 
A is selected by (a) at least half? (b) more than half? (c) between 5 and 10, inclusive? (d) 
14 or more? 

20. You know that 5% of all items produced at a certain factory are defective. You select 
25 items at random from a day’s production. What is the probability that the number of 
defectives in the sample is: (a) at least 1? (b) no more than 5? (c) between 3 and 7, 
inclusive? (d) 10 or more? (e) 0? 

21. The number of calls made from a pay telephone during a given time interval is Poisson 
distributed with X = 3.2. What is the probability that during such a time interval, the 
number of calls made from this telephone is: (a) no more than 2? (b) more than 2? (c) 5 
or fewer? (d) more than 5? (e) 0? (f) at least 1? 

22. The demand for key rings at a variety store during a day is Poisson distributed with 
X = 2.4. What is the probability that on a given day the number of calls for key rings is 

(a) 0? (b) at least T? (c) more than 1? (d) between 3 and 5 inclusive? 

23. Given the standard normal distribution, find P( - 1.65 < z < 1.65). 

24. Given the standard normal distribution, find P(z — 0.75). 

25. A population of values is normally distributed with a mean of 25.5. It is known that 
75.49% of the values are less than 27.8. What is the standard deviation of the population? 

26. The inside diameters of metal washers produced by a certain factory are normally 
distributed with a mean of 0.5 inch and a standard deviation of 0.01 inch. Of all washers 
produced, 0.008 are rejected because they are too small for a bolt used to test them. What 
is the diameter of the bolt used for the test? 

27. The breaking strengths of plastic bottles arc normally distributed with a variance of 
25. Approximately 0.0197 of specimens produced are rejected because they fail a quality- 
control test that subjects them to 255 psi of pressure. What is the mean breaking strength 
of the bottles? 

28. Agricultural experts have found that when a certain type of fertilizer is used, yields 
per acre of a certain grain are approximately normally distributed, with a mean and standard 
deviation of 40 and 10 bushels per acre, respectively, (a) When this fertilizer is used, 
what proportion of the acreage planted in this grain yields more than 50 bushels per acre? 

(b) What is the probability that a randomly selected acre will yield less than 15 bushels? 

29. Scores made by employees on a manual dexterity test arc normally distributed with a 
mean of 600 and a variance of 10,000. (a) What proportion of employees taking the test 
score below 300? (b) An employee is about to take the test. What is the probability that 
the employee’s score will be 850 or more? (c) What proportion of employees score between 
450 and 700? (d) Management has decided that those employees whose scores are among 
the top 10% will be considered for promotion to a better job. What score must an employee 
make in order to be eligible for promotion? (e) Suppose that management decides to 
consider for promotion only those employees who make a score of 800 or more. What 
percentage of the employees will be eligible to be considered for promotion? 

30. In a household survey, a market research firm found that family gasoline consumption 
per month was approximately normally distributed with a mean and standard deviation of 





70 and 9 gallons, respectively, (a) What proportion of families use between 55 and 73 
gallons per month? (b) A family is picked at random from this population. What is the 
probability that it uses more than 72 gallons per month? (c) What proportion of families 
in this population use less than 68 gallons? 

31. In a suburban community, on a given weekday evening, the head of the household is 
at home in 65% of the households. A researcher conducting a telephone survey randomly 
selects 15 households to call in an evening. What is the probability that the researcher will 
find the head of the household at home in exactly 8 households? 

32. You know that 80% of the people applying for a certain job have had no previous 
experience in this job. You select a random sample of 5 current applicants. What is the 
probability that exactly 3 have had no previous experience in the job? 

33. It is estimated that 30% of a certain group of truck drivers eat at a certain truck stop. 
Suppose that the estimate is correct. You question 25 drivers picked at random from this 
group. What is the probability that the number eating at the truck stop will be: (a) 5 or 
fewer? (b) between 19 and 15, inclusive? (c) 15 or more? 

34. In a certain population of adolescents, the proportion who have their own cars is 0.40. 
A random sample of 20 is selected from the population. What is the probability that the 
number who have their own cars will be: (a) greater than 10? (b) fewer than 5? (c) between 
5 and 15, inclusive? 

35. In a population of executives, 40% are under 45. What is the probability that in a 
random sample of 15 of these executives, 8 or more are under 45? 

36. You know that 20% of the salaried employees in a certain city have less than a high 
school education. You take a random sample of 20 of these employees. What is the 
probability that between 10 and 15, inclusive, have less than a high school education? 

37. The standard deviation of employees’ scores on an aptitude test is 10. What is the 
probability that the score of a randomly selected employee differs by more than 2 points 
from the mean score of all employees? Assume that scores are approximately normally 
distributed. 

38. The mean breaking strength of a certain brand of plastic trash bag is 10 pounds per 
square inch. Breaking strengths are approximately normally distributed with a standard 
deviation of 0.1. In a shipment of 10,000 bags, how many would we expect to find with 
breaking strengths below 9.8 pounds per square inch? The manufacturer says that if 1% 
or more of the bags have breaking strengths below 9.75 pounds per square inch, the firm 
will look for a stronger raw material. Should the firm do so at this time? 

39. On the average, a certain supermarket sells 250 quarts of milk per day. The standard 
deviation is 25. (a) On a given day, the supermarket stocks 300 quarts. What is the 
probability that all will be sold? Assume that the number of quarts sold per day is ap¬ 
proximately normally distributed, (b) How many quarts should the supermarket stock if 
the proprietor wants the probability of not being able to meet the demand for quarts of 
milk to be 0.01 ? 

40. Given the normally distributed random variable X, find the numerical value of k such 
that P(/jl - kcr < X < (jl + ko) — 0.754. 

41. Given the normally distributed random variable X with mean 100 and standard devia¬ 
tion 15, find the numerical value of k such that: (a) P(X < k) = 0.0094, (b) P(X > k) 
= 0.1093, (c) P(100 < X < k) = 0.4778, (d) P{k' < X < k) = 0.9660, where k* and 
k are equidistant from /jl. 

42. Given the normally distributed random variable X with cr = 10 and P(X < 40) == 
0.0080, find /x. 
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43. Given the normally distributed random variable X with cr = 15 and P(X < 50) = 
0.9904, find /jl. 

44. Given the normally distributed random variable X with cr — 5 and P(X > 25) = 
0.0526, find /jl. 

45. Given the normally distributed random variable X with /x = 25 and P(X < 10) = 
0.0778, find a. 

46. Given the normally distributed random variable X with /x = 30 and P(X < 50) = 
0.9772, find a. 

47. Let n — 20. Find /x and cr of the binomial distribution when (a) p = 0.1, 

(b) p = 0.2, (c) p - 0.3, (d) p = 0.4, (e) p = 0.5, (f) p = 0.6, (g) p = 0.7, 
(h) p = 0.8, (i) p — 0.9. For which value of p is cr smallest? Largest? 

48. In a clothing factory, the average number of machines that are inoperable on a given 
day is 3. Assume that the Poisson distribution is applicable. What is the probability that 
on a given day: (a) there are more than 3 inoperable machines? (b) there are fewer than 
3 inoperable machines? (c) all machines are operable? 

49. In a large office, the number of employees smoking in any 15-minute time period is 
Poisson distributed with a mean of 5. What is the probability that during a randomly 
selected 15-minute period, the number of employees smoking is (a) 3 or more? (b) between 
5 and 8, inclusive? (c) fewer than 4? 

50. In a large pine forest the number of trees per acre infested with Southern pine beetles 
is thought to be Poisson distributed with a mean of 0.2. If this is true, find the probability 
that on a randomly selected acre the number of infested trees is (a) 0, (b) more than 3, 

(c) either 1 or 2, (d) exactly 1. 

51. An automobile salesperson sells on the average 3 cars per week. Find the probability 
that during a given week the salesperson sells: (a) 1 car, (b) no cars, (c) 2 or more cars, 

(d) 5 or more cars. 

52. At a certain intersection there is an average of 3 accidents per week. Assuming that 
the number of accidents per week follows a Poisson distribution, what is the probability 
that in a given week there are (a) 6 or more accidents? (b) no accidents? (c) between 3 
and 7 accidents, inclusive? (d) 3 or fewer accidents? 

53. In a large firm, the number of employee absences is Poisson distributed with a mean 
of 10 employees absent per day. On a randomly selected day, what is the probability that 
the number of employees absent is (a) more than 5? (b) more than 15? (c) between 5 and 
15, inclusive? (d) fewer than 8? (e) fewer than 12? 

54. The manufacturer of a certain prestige item has found that the demand for this product 
closely follows a Poisson distribution. The expected demand is 5 per month. Find the 
probability that in a given month the demand is (a) no items, (b) 6 or more items, (c) 4 
or fewer items. 

55. In a population of employees, 10% have completed high school. Suppose that a 
random sample of 50 persons is selected from the population. What is the probability that 
8 or more will be high school graduates? Use the Poisson approximation to the binomial 
distribution to find the answer. 

56. A population consists of 20 clerical workers. Of these workers, 12 have been with 
their present employer for more than 5 years. You select, without replacement, a random 
sample of five workers from this population. What is the probability that exactly 3 will 
have been with their present employer for more than 5 years? 







57. In a population of heads of household, 38% own their homes. You select a simple 
random sample of size 15 from this population. What is the probability that the number 
in the sample who own their homes will be (a) exactly 8? (b) between 5 and 7, inclusive? 
(c) at least 12? (d) 9 or more? 

58. In a population of executives, 10% have changed employers within the past two years. 
You select a simple random sample of 70 from this population. What is the probability 
that 12 or more in the sample will have changed employers within the past two years? 

59. In a certain county, in 70% of the households, there is at least one person who is over 
40. You draw a random sample of 200 from this population. What is the probability that 
the number of households with a person over 40 is between 150 and 155, inclusive? 




5. Some Important Sampling 
Distributions 


Chapter Objectives: This chapter is the most important 
one in the book. It holds the key to statistical inference. 
Study this chapter very carefully. Pay special attention to 
the general discussion of sampling distributions. Before 
you can understand statistical inference, you must under¬ 
stand sampling distributions. After studying this chapter 
and working the exercises, you should be able to do the 
following. 

1. Define a simple random sample 

2. Use a table of random numbers to draw a simple ran¬ 
dom sample from a population 

3. Construct the sampling distribution of sample means 
computed from samples drawn from a small popula¬ 
tion 

4. Determine the mean, standard error, and functional 
form of the sampling distribution of the mean, the 
difference between two means, a proportion, and the 
difference between two proportions 

5. Explain the central limit theorem and discuss its im¬ 
portance in statistical inference 

6. Use your knowledge of sampling distributions to com¬ 
pute the probability associated with specified sample 
results when the statistic of interest is the mean, the 
difference between two means, a proportion, or the 
difference between two proportions 

















5.1 INTRODUCTION 


Let us pause here and review briefly the material that we have covered so far. 
Chapter 1 emphasized the importance of statistics to the manager and the re¬ 
searcher. The object of that chapter was to justify statistics as a worthwhile subject 
for the student of business to study. Chapter 2 concerned the organization and 
summarization of data generated by the routine operation of a business firm or by 
a special study. It presented computations of the basic measures, such as the mean 
and standard deviation, used for describing a set of data. When computed from 
the data of a sample, these descriptive measures were called statistics. When 
computed from a population of data, they were called parameters. Chapter 3, the 
first of three chapters designed to lay the foundation for statistical inference, was 
devoted to the basic concepts of probability. Chapter 4 expanded on these concepts 
and introduced the idea of a probability distribution. Three distributions of discrete 
variables—the binomial, the hypergeometric, and the Poisson—were discussed in 
detail, as was the normal distribution, a continuous distribution that we will refer 
to frequently in later chapters. 

We come now to the last of the three chapters that link the descriptive material 
of Chapter 2 and the concepts of inference that begin in Chapter 6. You must 
understand the principles introduced here if you are to understand inferential pro¬ 
cedures that make up the major portion of this book. 

The ideas presented in this chapter are based on the concept of sampling. 
Although we have already defined the word sample , here we shall discuss the 
concepts of a sample and sampling in greater detail and in more technical terms. 


5.2 SIMPLE RANDOM SAMPLING 

It is important first to distinguish between two types of sample, the probability 
sample and the nonprobability sample. 

A probability sample is a sample of elements drawn from a population of 
elements in such a way that every element in the population has a known and 
nonzero probability of being selected. 

All other methods of selecting a sample are known as nonprobability methods. 
The only type of sample we consider in this text is the probability sample. This 
is because only for probability samples are there statistically sound procedures 
that allow us both to infer from a sample to the population from which it is drawn, 
and to obtain estimates of the sampling error involved. 

We usually do not sample small populations. Instead, when we need to know 
their characteristics, we examine them in their entirety. As a rule, we use sampling 
only when the population of interest is so large that examining it completely is 
impractical. 

From any finite population of size N, we can draw a finite number of different 
samples of size n. These samples are of interest when they are simple random 



samples, which are a special kind of probability sample. A simple random sample 
is defined as follows: 

If a sample of size rt is drawn from a population of size N in such a way that 
every possible sample of size n has the same probability of being selected, the 
sample is called a simple random sample. 

The mechanics of drawing a sample that satisfies the definition of a simple 
random sample is called simple random sampling. When drawing a simple random 
sample, we can sample with replacement or without replacement. In practice, 
sampling is almost always done without replacement. 

In selecting a simple random sample from a population, in order to ensure true 
randomness, we must use some objective method. One such procedure involves 
the use of a table of random numbers, such as Table D of the Appendix. Using 
such a table ensures that every observation in the sampled population has an equal 
and independent chance of being selected. This is because each digit in the ran¬ 
dom-number table was generated in such a way that the values 0 through 9 had 
an equal and independent probability of occurring. 

EXAMPLE 5.2.1 Suppose that the population of interest consists of 200 workers at 
a certain firm. We want to draw a simple random sample from this population in 
order to find out how many units these employees produced during the past week. 
For each employee there is a card on which the production record is recorded. 
These cards, which are filed alphabetically in a card file, are also numbered in 
sequence from 001 to 200. Table 5.2.1 represents our population of interest. 

We can use Table D to draw a sample of size 10 without replacement from this 
population. Some samplers like to select a random starting point in the table of 
random numbers. However, since all the digits in the table are random, there is 
nothing wrong with starting with the very first number in the table. If we draw 
another sample from the same population later, and use the same table of random 
numbers, we begin drawing random numbers where we left off. This prevents us 
from drawing the same sample twice. Since we have 200 cards from which to 
choose, we can use only those three-digit numbers that are between 001 and 200, 
inclusive. 

The first three-digit number in Table D is 859, a number we cannot use. As 
we proceed down the column, we find that we can use the second number, 074. 
Therefore, employee number 074 is the first employee we select for inclusion in 
the sample. This employee produced 49 units, as shown in Table 5.2.1. We record 
both the random number used and the number of units produced. (We record the 
random number so we can keep track of the numbers used. Since we are sampling 
without replacement, we do not want to use the same random number twice.) If 
we proceed down the column in this manner, we obtain the random numbers and 
corresponding number of units produced that are shown in Table 5.2.2. Note that 
when we get to the bottom of the column, we merely shift one digit to the right 
and move up the column. This is one of many alternatives. For example, we could 
start at the top of the next column. 







TABLE 5.2.1 
Number of units 
produced by 200 
employees 


TABLE 5.2.2 
Sample of 10 
employees, 
showing number 
of units produced 
(from population 
of Table 5.2.1) 


001. 

30 

041. 

59 

081. 

65 

121. 

47 

161. 

62 

002. 

38 

042. 

56 

082. 

42 

122. 

64 

162. 

29 

003. 

33 

043. 

65 

083. 

73 

123. 

55 

163. 

37 

004. 

49 

044. 

50 

084. 

44 

124. 

50 

164. 

27 

005. 

33 

045. 

54 

085. 

54 

125. 

65 

165. 

36 

006. 

43 

046. 

61 

086. 

67 

126. 

53 

166. 

43 

007. 

60 

047. 

57 

087. 

49 

127. 

32 

167. 

30 

008. 

31 

048. 

55 

088. 

38 

128. 

44 

168. 

41 

009. 

34 

049. 

26 

089. 

59 

129. 

38 

169. 

59 

010. 

61 

050. 

41 

090. 

42 

130. 

37 

170. 

63 

011. 

49 

051. 

64 

091. 

46 

131. 

53 

171. 

55 

012. 

64 

052. 

25 

092. 

30 

132. 

44 

172. 

32 

013. 

62 

053. 

28 

093. 

64 

133. 

27 

173. 

32 

014. 

37 

054. 

49 

094. 

28 

134. 

40 

174. 

31 

015. 

25 

055. 

53 

095. 

64 

135. 

43 

175. 

35 

016. 

38 

056. 

25 

096. 

46 

136. 

45 

176. 

33 

017. 

65 

057. 

33 

097. 

59 

137. 

33 

177. 

58 

018. 

56 

058. 

28 

098. 

60 

138. 

60 

178. 

31 

019. 

55 

059. 

25 

099. 

46 

139. 

62 

179. 

38 

020. 

43 

060. 

60 

100. 

27 

140. 

30 

180. 

29 

021. 

58 

061. 

34 

101. 

59 

141. 

51 

181. 

43 

022. 

38 

062. 

25 

102. 

43 

142. 

49 

182. 

56 

023. 

71 

063. 

61 

103. 

50 

143. 

31 

183. 

53 

024. 

47 

064. 

42 

104. 

51 

144. 

29 

184. 

64 

025. 

65 

065. 

48 

105. 

39 

145. 

36 

185. 

38 

026. 

54 

066. 

57 

106. 

59 

146. 

50 

186. 

36 

027. 

74 

067. 

26 

107. 

33 

147. 

54 

187. 

59 

028. 

36 

068. 

55 

108. 

60 

148. 

38 

188. 

68 

029. 

62 

069. 

36 

109. 

26 

149. 

60 

189. 

26 

030. 

31 

070. 

33 

110 . 

72 

150. 

65 

190. 

72 

031. 

48 

071. 

63 

111 . 

25 

151. 

36 

191. 

29 

032. 

35 

072. 

48 

112 . 

44 

152. 

25 

192. 

32 

033. 

26 

073. 

37 

113. 

58 

153. 

28 

193. 

73 

034. 

62 

074. 

49 

114. 

49 

154. 

56 

194. 

63 

035. 

51 

075. 

46 

115. 

31 

155. 

51 

195. 

69 

036. 

67 

076. 

31 

116. 

56 

156. 

53 

196. 

57 

037. 

30 

077. 

26 

117. 

37 

157. 

40 

197. 

38 

038. 

57 

078. 

28 

118. 

66 

158. 

33 

198. 

50 

039. 

50 

079. 

63 

119. 

55 

159. 

26 

199'. 

60 

040. 

62 

080. 

37 

120. 

66 

160. 

42 

200. 

28 


Random number 

074 

037 

091 

018 

189 

119 

145 

139 

196 

170 

Sequence in sample 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Units 

49 

30 

46 

56 

26 

55 

36 

62 

57 

63 



This completes the drawing of our simple random sample. Subsequently when 
we use the term simple random sample, we mean that the sample was drawn in 
this or an equivalent manner. Chapter 14 will discuss other types of sampling. 

Many computers currently on the market have the capability of generating 
random numbers. Rather than using printed tables of random numbers, you may 
use a computer to generate random numbers. Actually, the “random” numbers 
generated by computers are pseudorandom numbers. They are the result of a 
deterministic formula. However, as Fishman (1973) points out, the numbers ap¬ 
pear to serve satisfactorily for many practical purposes. 

5.2.1 Use the table of random numbers to select another sample of size 10 from the 
population in Table 5.2.1. You may begin selecting random numbers at the place where 
Example 5.2.1 left off. 


5.3 SAMPLING DISTRIBUTIONS 

We can define a sampling distribution as follows: 

The distribution of all possible values that can be assumed by some statistic, 
computed from samples of the same size randomly drawn from the same pop¬ 
ulation, is called the sampling distribution of that statistic. 

We can construct sampling distributions empirically from discrete, finite pop¬ 
ulations. The construction of a sampling distribution consists of the following 
steps. 

1. From a discrete, finite population of size N , randomly draw all possible samples 
of size n. 

2. Compute the value of the statistic of interest for each sample. 

3. List in one column the different observed values of the statistic. In another 
column list the corresponding frequency of occurrence of each observed value of 
the statistic. 

Three characteristics of a given sampling distribution are of interest to us: its 
mean, its variance, and its functional form (how it looks when graphed). 

We cannot construct exact sampling distributions empirically when the popu¬ 
lation we are sampling is infinite. In such a situation, we can only approximate 
the sampling distribution of the statistic of interest by taking a large number of 
samples. This is not a problem of any practical importance, since sampling dis¬ 
tributions per se are of only theoretical interest. 

Since we can derive sampling distributions mathematically, the empirical con¬ 
struction of a sampling distribution is of academic interest only. The procedures 
involved are not compatible with the mathematical level of this text. We treat the 
subject in some detail in this chapter simply to help you understand the nature of 
a sampling distribution. For the more mathematical and theoretical aspects of 
sampling distributions, see the textbooks on mathematical statistics by Anderson 








and Bancroft (1952), Freund and Walpole (1980), Hoel (1971), Hogg and Craig 
(1978), and Mood et al. (1974). 


Constructing the 

Sampling 

Distribution 


5.4 DISTRIBUTION OF THE SAMPLE MEAN 

As we have seen, the arithmetic mean is an important descriptive measure for 
characterizing the central tendency of a set of data. In many situations, we want 
to know the mean of a population. This information may not be available unless 
we draw a sample from the population and make an inference regarding the 
parameter \x based on analysis of the sample data. We shall consider this procedure 
in detail in Chapter 6 . However, since the validity of this inferential procedure 
depends on knowing the sampling distribution of the statistic involved, that is, 
the sample mean, let’s give some thought to this matter before we proceed. 

The text that follows illustrates the construction of a sampling distribution of 
the sample mean computed from samples drawn from a very small population. It 
is important that you realize that this is for instructional purposes only. In practice, 
we do not actually construct a sampling distribution as a preliminary to statistical 
inference. 


EXAMPLE 5.4.1 Suppose that a population consists of the 10 salespersons employed 
by a certain firm. The random variable of interest, X, is the number of years a 
salesperson has been with the firm. The values of the variable are as follows: X { 
= 3, X 2 = 6 , X 3 = 2, X 4 = 4, X s = 8 , X 6 = 7, X 7 = 9, X 8 = 5, X 9 = 1, 
X J0 = 10. For this population, we may compute the following parameters: 




< 7 “ 


Sir,- - p ) 2 

N 


8.25 


To construct the sampling distribution of jc computed from samples drawn from 
this population, we follow the steps outlined in Section 5.3. 

1. We draw all possible samples of some size n. Suppose that we let n = 2. 
Table 5.4.1 shows the possible samples. Note that there are 100 samples. In 
general, when we sample with replacement, as we have done here, there will be 
N n possible samples of size n. 

2. We compute the mean x for each of these samples. The sample means are 
shown in parentheses in Table 5.4.1. 

3. We list the different values of x that we observed, along with their frequencies 
of occurrence. The resulting table, Table 5.4.2, constitutes the sampling distri¬ 
bution of x for samples of size 2 from the specified population. 

The individual probabilities (relative frequencies) shown in Table 5.4.2 are all 
greater than 0, and their sum is equal to 1. Thus the requirements for a probability 
distribution are met. 

As stated earlier, we usually are interested in th e functional form , the mean, 
and the variance of a sampling distribution. 



TABLE 5.4.1 
All possible 
samples of size 
n = 2 from a 
population of size 
N = 10 


Second draw 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1,1 

1,2 

1,3 

1,4 

1,5 

1,6 

1,7 

1,8 

1,9 

1,10 

(1) 

(1.5) 

(2) 

(2.5) 

(3) 

(3.5) 

(4) 

(4.5) 

(5) 

(5.5) 

2,1 

2,2 

2,3 

i 2,4 

2,5 

2,6 

2,7 

2,8 

2,9 

2,10 

(1.5) 

(2) 

(2.5) 

(3) 

(3.5) 

(4) 1 

(4.5) 

(5) 

(5.5) 

(6) 

3,1 

3,2 

3,3 

3,4 

3,5 

3,6 

3,7 

3,8 

3,9 

3,10 

(2) 

(2.5) 

(3) 

(3.5) 

(4) 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

4,1 

4,2 

4,3 

4,4 

4,5 

4,6 

4,7 

4,8 

4,9 

4,10 

(2.5) 

(3) 

(3.5) 

(4) 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

5,1 

5,2 

5,3 

5,4 

5,5 

5,6 

5,7 

5,8 

5,9 

5,10 

(3) 

(3.5) 

(4) 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

6,1 

6,2 

6,3 

6,4 

6,5 

6,6 

6,7 

6,8 

6,9 

6,10 

(3.5) 

(4) 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

(8) 

7,1 

7,2 

7,3 

7,4 

7,5 

7,6 

7,7 

7,8 

7,9 

7,10 

(4) 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

(8) 

(8.5) 

8,1 

8,2 

8,3 

8,4 

8,5 

8,6 

8,7 

8,8 

8,9 

8,10 

(4.5) 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

(8) 

(8.5) 

(9) 

9,1 

9,2 

9,3 

9,4 

9,5 

9,6 

9,7 

9,8 

9,9 

9,10 

(5) 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

(8) 

(8.5) 

(9) 

(9.5) 

10,1 

10,2 

10,3 

10,4 

10,5 

10,6 

10,7 

10,8 

10,9 

10,10 

(5.5) 

(6) 

(6.5) 

(7) 

(7.5) 

(8) 

(8.5) 

(9) 

(9.5) 

(10) 



Samples above or below the principal diagonal result when 
Sample means are in parentheses. 


sampling is without replacement. 


TABLE 5.4.2 
Sampling 
distribution of x 
computed from 
samples in Table 
5.4.1 


X 

Frequency 

Relative 

frequency 

X 

Frequency 

Relative 

frequency 

1 

1 

1/100 

6 

9 

9/100 

1.5 

2 

2/100 

6.5 

8 

8/100 

2 

3 

3/100 

7 

7 

7/100 

2.5 

4 

4/100 

7.5 

6 

6/100 

3 

5 

5/100 

8 

5 

5/100 

3.5 

6 

6/100 

8.5 

4 

4/100 

4 

7 

7/100 

9 

3 

3/100 

4.5 

8 

8/100 

9.5 

2 

2/100 

5 

9 

9/100 

10 

1 

1/100 

5.5 

10 

10/100 

Total 

100 

100/100 



FIGURE 5.4.1 
Distribution of 
population and 
sampling 
distribution of x 
for n = 2 



We can compare the functional form of the distribution for x that we just 
constructed with the distribution of the original population. Figure 5.4.1 shows 
both distributions. Observe that the two figures are very different. The population 
distribution is a uniform distribution (that is, each value occurs with the same 
frequency). The distribution of x is a symmetric distribution that is by no means 
uniform. 

An impressive feature of the sampling distribution of x, as Figure 5.4.1 shows, 
is the fact that the most frequently occurring value of x is 5.5. We also note the 
symmetric shape of the sampling distribution. We shall see shortly that these 
characteristics are not unique to these particular data. This is a general pattern of 
behavior that is inherent in sampling distributions of sample means. 

Now we compute the mean /x T of the sampling distribution by adding the 100 
sample means given in Table 5.4.1 and dividing by 100. That is, 

__ 25 f _ 550 

^ “ N n “ 10 0 “ 5,5 

This formula is a special case of Formula 2.6.4, which shows how to compute 
lx, the mean of a population of original observations. In the present case we are 
computing %, the mean of a population of sample means. Therefore x, in this 
formula has the same role as jc,- in Formula 2.6.4, and N n here has the same role 
as N in Formula 2.6.4. 



Note that the mean of the sampling distribution of x is equal to the mean of 
the original population. 

Finally, we compute the variance of x, cr 2 , as follows: 


N n 

(1 - 5.5 ) 2 + (1.5 - 5.5 ) 2 + • • • + (10 - 5.5 ) 2 _ 412.5 
100 “ 100 


4.125 


The variance of the sampling distribution is not equal to the variance of the 
population. However, the variance of the sampling distribution is equal to the 
variance of the population divided by the size of the sample used to obtain the 
sampling distribution. That is, 


O'! 


cr 2 _ 8.25 
n 2 


4.125 


The square root of the variance of the sampling distribution—that is, the standard 
deviation of the sampling distribution, Vcrf = cr/y/n —is called the standard 
error of the mean or, simply, the standard error . It is written <r x . Variation in a 
sampling distribution represents estimation errors for the various possible samples. 
Thus we call the standard deviation of these errors the standard error. 

Here is a summary of the symbols used to designate the mean, the variance, 
and the standard deviation of a sampled population, a single sample from the 
population, and the resulting sampling distribution of the sample mean. 


Descriptive measure 

Sampled population 

Single sample 

Sampling distribution of x 

Mean 


X 


Variance 

u 2 

s 2 

o-f 

Standard deviation 

cr 

s 

Sz 


mm 


a 




Normally 

Distributed 

Populations 


The fact that p* = p and cr\ = cr 2 /n is not peculiar to this example. These 
results are characteristic of sampling distributions in general when sampling is 
with replacement from a finite population, or when the sampled population is 
infinite. We can also describe a sampled population according to whether it is 
normally or nonnormally distributed. The sampling distribution of x is different 
in the two cases. Actually, as noted in Chapter 4, no variables are exactly normally 
distributed, since the normal distribution is a mathematical ideal that is not realized 
in practice. In this text, when we speak of a normally distributed population, we 
mean one that approximates a normal distribution well enough for us to use the 
properties of a normal distribution in describing it. 

The following descriptions of the sampling distribution of 3c under the two 
conditions have been proved mathematically. For the proofs, see the mathematical 
statistics texts mentioned earlier. 
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The Central Limit 
Theorem 


When sampling is from a normally distributed population, the sampling distri¬ 
bution of the sample mean will have the following properties. 

1. The distribution of x will be normal, regardless of the size of the sample. 

2. The mean ^ of the distribution of x will be equal to the mean of the 
population from which the samples were drawn. 

3. the variance <x| of the distribution of x will be equal to the variance of the 
population divided by the sample size. 

In later chapters, when we make inferences about normally distributed popu¬ 
lations, we shall use these properties of the sampling distribution of x. Sometimes 
we may have some doubt as to whether or not the population of interest is normally 
distributed. Chapter 11 discusses a procedure that you can use to help you deter¬ 
mine whether or not a population of unknown form is likely to be normally 
distributed. 


In many situations the normal distribution so poorly approximates the population 
of interest that, if we based our analyses on this distribution, we would get mis¬ 
leading results. Consequently, when the sampled population is not normally dis¬ 
tributed, we must know the nature of the sampling distribution of the sample 
mean. 

Knowledge of the sampling distribution of x when sampling is from a nonnor- 
mally distributed population comes from the proof of an important mathematical 
theorem, the central limit theorem. We can summarize this theorem as follows. 

Given a population of any functional form with a mean and finite variance 
or 2 , the sampling distribution of x, computed from simple random samples of 
size n from this population, will be approximately normally distributed with 
mean /x and variance cr 2 /n when the sample size is large. 

The central limit theorem guarantees that if we sample from a nonnormally 
distributed population, we will get approximately the same results as we would 
if the population were normally distributed, provided that we take a large sample. 
This is an important result. It is useful in applying the techniques of statistical 
inference. 

To study the effect of the central limit theorem, we could draw k large sets of 
samples of varying sizes, n x < n 2 < • • • < n k , from a nonnormally distributed 
population and compare the resulting sampling distributions. We would find that 
the larger the value of n, the more closely the sampling distribution resembles a 
normal distribution. Figure 5.4.2 illustrates the general results of such a procedure. 

Bradley (1971, 1973, 1976) constructed sampling distributions of x based on 
samples of size 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024. He drew, with 
replacement, 10,000 random samples of each size and computed x for each. He 
compared a histogram plot of each sampling distribution with the appropriate 
normal distribution. These plots showed, as does Figure 5.4.2, the effect of the 
central limit theorem on the sampling distribution of x. 



FIGURE 5.4.2 
Illustration of 
effect of central 
limit theorem on 
the sampling 
distribution of K 
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The phrase “when the sample size is large” that appeared in our statement about 
the central limit theorem requires explanation. The size of the sample needed to 
achieve an approximately normal sampling distribution of x depends on how non¬ 
normal the original population is. The greater the departure of the population 
distribution from normal, the larger the sample must be. One rule of thumb states 
that the sample size should be 30 or more. We adopt this rule for the sake of 
convenience in later chapters when we apply inferential procedures. 

The previous results assume that the sample is drawn either from an infinite 
population or from a finite population with replacement. As we have pointed out, 
we do not sample with replacement in most practical situations. Because of this, 
we need to know how the sampling distribution of x behaves when we sample 
without replacement from a finite population. When we do so, we can describe 
the distribution of the sample mean as follows: 

When sampling is without replacement from a finite population, the sampling 
distribution of the sample mean will have a mean /a* equal to the population 
mean p and a variance cr| equal to [<j 2 /n][N - n)/(N - 1)]. When the sample 
size is large enough, the central limit theorem applies, and the sampling dis¬ 
tribution will be approximately normally distributed. 





The use of the central limit theorem when sampling is without replacement may 
not be valid for certain populations, as discussed by Cochran (1977). 

Let us now try to verify the results relating to the mean and variance with a 
sampling experiment using the data of Example 5.4.1. We cannot verify that x is 
approximately normally distributed, since from a population of size 10 it is im¬ 
possible to draw samples large enough to apply the central limit theorem. We 
should, however, be able to verify the statements about /i- and rr|. 

If we take samples of size 2 without replacement, the resulting sample means 
are those shown above the principal diagonal in Table 5.4.1. The possible sample 
means are shown also below the diagonal, but with the order of drawing values 
reversed. We see that there are 45 of these. We can compute the means below 
the diagonal from the samples that result when the order of selection is the reverse 
of that which yields the sample means above the diagonal. You may verify that 
you can also obtain the following means and variances if you use all 90 of the 
off-diagonal means in Table 5.4.1. In performing these calculations, we have 
ignored, for simplicity, the order in which the elements of the samples were drawn. 

In general, when we draw samples of size n from a finite population of size N 
without replacement, if we ignore the order in which the elements are selected, 
the number of possible samples is given by the combination of N things taken n 
at a time. In our present example, 

(n\ __ N\ 10! 10-9-8! _ 45 

V«/ nl(N - n)\ “ 2!8! “ 2-1-8! 

The mean of these 45 sample means is 

25c,. 1.5 + 2 + • • • + 9.5 247.5 

45 “ 45 

Thus we again see that ju, T = /x = 5.5. The variance of this sampling distribution 
is found to be 





2(x, - Mi) 2 



(1.5 - 5.5) 2 + (2 - 5.5) 2 + ••'• + (9.5 - 5.5) 2 165 

... . ..-.— = — = 3.67 


This time, the variance of the sampling distribution is not equal to the population 
variance divided by the sample size, since cr| = 3.67 # 8.25/2 = 4.125. But 

a 2 N - n 8.25 8 66 

n ' N - 1 “ 2 *9“ 18“ 3 ' 67 


and we have verified the fact that in this example 




cr 2 N — n 
~n ' N - 1 


Applications 


We may ignore the factor (N — n)/(N — 1), called the finite population cor¬ 
rection (fpc), when the sample size is small in comparison to the population size. 
When the population is a great deal larger than the sample, the difference between 
cr 2 /n and [cr 2 /n][{N - n)/{N - 1)] is negligible. Suppose that a sample of size 
25 is drawn from a population containing 10,000 observations. The finite popu¬ 
lation correction would be equal to (10,000 - 25)/(9999) = 0.9976. The product 
of 0.9976 and cr 2 /n is almost equal to the product of or 2 /n and 1. Most statisticians 
do not use the finite population correction when the sample contains less than 5% 
of the observations in the population. Note that cr= is always smaller when sam¬ 
pling is without replacement than when sampling is with replacement. 


We have talked about the sampling distribution of I in detail here so that you will 
be able to apply the concept confidently later, in making inferences. We need not 
wait for Chapter 6, however, to apply this material. We can now answer questions 
such as this: “Given a population with mean p and variance cr 2 , what is the 
probability that a simple random sample of size n will yield a sample mean x as 
large as or larger than some specified value x 0 ?” 

EXAMPLE 5.4.2 The pressure, in pounds per square inch, required to rupture a 
certain type of fuel tank is an approximately normally distributed random variable 
with a mean of 2800 psi and a variance of 9216 psi squared. Suppose that we 
select a simple random sample of size 10 from this population and test each tank 
until it ruptures. What is the probability that the mean pressure required to rupture 
the tanks in the sample will be 2750 psi or less? 

The single sample under consideration is one of the possible samples of size 
10 that we can draw from the population. The mean of this sample is one of the 
.Ts comprising the sampling distribution of T, which, theoretically, we could 
derive from this population. 

If the population is approximately normally distributed, this assures us that the 
sampling distribution of I is, for all practical purposes, normally distributed. The 
m ean and standard deviation of the sampling distribution are equal to 2800 and 
V9216/10 = 30.36, respectively. We assume that the population is large relative 
to the sample, so that we can ignore the finite population correction. 

From Chapter 4 we know that we can transform any normally distributed ran¬ 
dom variable to the standard normal distribution by means of a simple formula. 
In the present example, the random variable is 3c, and the mean and standard 
deviation of its distribution are p- = p and cr- = a/\/n, respectively. 

Appropriate modification of the formula for z gives us the following formula 
for transforming the normal distribution of x to the standard normal distribution: 

~ M i = *o ~ ^ 
cr 7x (j/y/n 


(5.4.1) 





5.4 Distribution of the Sample Mean 


131 


In this example, the probability is represented by the area to the left of x = 2750 
under the curve of the sampling distribution. This area is equal to the area to the 
left of 


2750 - 2800 _ -50 

V92T6/VT0 ~ 96/3.16 


Table C indicates that the area to the left of -1.65 is 0.0495. Thus, we can say 
that the probability of drawing, from the specified population, a sample with a 
mean of 2750 or less is 0.0495. Figure 5.4.3 shows the relationship between the 
original population, the sampling distribution of x, and the standard normal dis¬ 
tribution. 


EXAMPLE 5.4.B The mean life of a certain saw blade is 41.5 hours, with a standard 
deviation of 2.5 hours. What is the probability that a simple random sample of 
size 50 drawn from this population has a mean of between 40.5 and 42 hours? 

We are not told that the population is normally distributed. However, this does 
not prevent us from using the standard normal distribution as in Example 5.4.2. 
Because the sample size is large, the central limit theorem tells us that the sampling 
distribution of x is at least approximately normally distributed regardless of how 
the population is distributed. The mean and standard deviation of the sampling 
distribution of x are /Xj. - 41.5 and cr- = 2.5/V50 = 0.35, respectively. The 

probability we seek is 


P(40.5 < x < 42) 


/ 40.5 - 41.5 < _ 42 - 41.5 \ 

V 0.35 ~ Z ~ 0.35 / 


= P(z < 1.43) - P(z < -2.86) 


= 0.9236 - 0.0021 = 0.9215 


To illustrate the use of the fpc, let us suppose that the population referred to 
in Example 5.4.3 consists of 800 tools. The standard error in that case would be 


< 7 ? 


(2.5) 2 (800 - 50) 


50(800 - 1) 

The desired probability, then, would be 

40.5 


= V(0. 125X0.938673) = 0.34 


P(40.5 < x < 42) = P 


41.5 42 

< z < — 


41.5 


0.34 0.34 

= 0.9292 - 0.0016 = 0.9276 


The result we find using the fpc is not very different from the result we find 
without it, even though the sample contains 6.25% of the population. 



5.4.1 The tensile strength of a certain type of wire is normally distributed with a mean of 
99.8 and a standard deviation of 5.48. (a) What are the mean and standard deviation of 
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FIGURE 5.4.3 
Distribution of a 
population, the 
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distribution of x, 
and the standard 
normal distribution 
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the sampling distribution of the sample mean based on simple random samples of size 
100? (b) You draw a single simple random sample of 16 values from this population. 
What is the probability that the mean of this sample will be between 98.8 and 100.9? 
5.4.2 An employment agency has found that the mean time required for an applicant to 
take an aptitude test is 24.5 minutes, with a standard deviation of 4.5 minutes, (a) What 
are the mean and standard deviation of the sampling distribution of the sample mean based 
on simple random samples of size 81 from this population? (b) You draw a simple random 
sample of 81 applicant files. What is the probability that the mean time applicants in this 
sample need for taking the test is greater than 25 minutes? 



5.4.3 A firm employs 1500 people. During a given year, the mean amount contributed to 
a charity drive per employee was $25.75. The standard deviation was $5.25. What is the 
probability that a simple random sample of 100 employees yields a mean between $25.00 
and $27.00? 

5.4.4 In a population of 1200 executives, the mean amount spent on lunch per day is 
$6.50. The standard deviation is $6.00. What is the probability that a simple random 
sample of 36 executives from this population yields a mean between $5.00 and $10.00? 

5.4.5 Select a simple random sample of 125 subjects from the population of employed 
heads of households given in Appendix IT. Use commuting distance to work as the variable 
of interest. Construct a frequency distribution, a histogram, and a frequency polygon. 
Compute the mean commuting distance and the variance. Compare your results with those 
of your classmates. 

5.4.6 Suppose that a population consists of the values 2, 4, 6, 8, and 10. Construct the 
sampling distribution of I based on samples of size 2 selected without replacement. Find 
the mean and variance of the original population and of the sampling distribution. 

5.4.7 In a population of assembly-line workers, the mean length of employment with their 
present firm is 2.5 years. The standard deviation is 3 years. A simple random sample of 
40 is drawn from this population. What is the probability that the mean will be more than 
3.5 years? 


5.5 DISTRIBUTION OF THE DIFFERENCE BETWEEN 
TWO SAMPLE MEANS 

In practical situations we are often interested in the difference between two pop¬ 
ulation means. In order to make inferences about this difference from sample data, 
we need to know the properties of the sampling distribution of the difference 
between two sample means, x ] — T 2 . 

In practice, we would not try to actually construct the sampling distribution of 
the difference between two means. We can, however, easily conceptualize its 
construction when the two populations of interest are finite. First we select, with¬ 
out replacement, from population 1 all possible simple random samples of size n l 

and compute the mean for each sample. There are M such samples, where N l is 

the population size and n x is the size of the sample drawn from population 1. 
Next we select, without replacement, all possible simple random samples of size 
n 2 from population 2 and compute the mean for each of these. Then we form all 
possible pairs of sample means, taking one mean from population 1 and one mean 
from population 2. The samples composing each pair are independent. We then 
compute the difference between each of these possible pairs of means. Table 5.5.1 
shows the results. 

The distribution we seek is the distribution of the differences between these 
pairs of sample means. Assume that the two populations are approximately nor¬ 
mally distributed. If we plotted the sample differences against their frequency of 
occurrence, the result, for all practical purposes, would be a normal distribution 



TABLE 5.5.1 

Working table for 
constructing the 

Samples 

from 

population 1 

Samples 

from 

population 2 
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means, 
population 1 

Sample 
means, 
population 2 

All possible 
differences 
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distribution of the 

"11 

"12 

Xu 

*12 

*1 1 *12 


difference between 

"21 

"22 

*21 

*22 

*11 ~ *22 
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with a mean equal to p x — (jl 2 , the difference between the true population means, 
and a variance equal to {or 2 x /n x ) + ( cr 2 /n :2 ). 

This procedure is valid when the sample sizes n x and n 2 are either equal or 
different, and when the population variances &\ and cr 2 are either equal or dif¬ 
ferent. Although we can construct the exact sampling distribution only with finite 
populations, the results of the outlined procedure also apply to infinite populations. 
We may summarize: 

Given two normally distributed populations with means p x and p 2 , and vari¬ 
ances cr] and <j\, respectively, the sampling distribution of the difference 
x x - x 2 between the means of independent samples of size n x and n 2 drawn 
from these populations is normally distributed with mean = Mi “ M 2 

and variance + (cr 2 2 /n 2 ). 

Figure 5.5.1 illustrates the sampling distribution of the difference between two 
sample means. This description specifies that the samples be independent. Two 
samples are independent if the selection of elements to be included in one sample 
is in no way influenced by the selection of elements to be included in the other 
sample. Note that in Table 5.5.1 M = the number of samples drawn from 

population 1, and ^ = the number of samples drawn from population 2. 

EXAMPLE 5.5.1 Two companies manufacture high-temperature lubricants aimed at 
the same market. Company A claims that the mean temperature at which its 
product ceases to be effective is 505°F. It quotes a standard deviation of 10°. 
Company B states that corresponding data for its product are a mean of 475°F 
and a standard deviation of 7°. Experience has shown that temperatures at failure 
for both products are approximately normally distributed. Suppose that a simple 
random sample of 20 specimens of Company A’s product and an independent 
simple random sample of 25 specimens of Company B’s product are tested. What 
is the probability that the difference between the mean temperature at failure for 
the two samples will be between 25 and 35 degrees? 

From the data given here, we know that the sampling distribution of x A — x B 
is normally distributed with mean p A - fi B and variance ((r A /n A ) + (og/rc B ). 
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To find the desired probability, we transform this normal distribution to the standard 
normal distribution using an adaptation of an earlier formula. The new formula 
is: 


C*i - x 2 ) - <>, - fl 2 ) 



For the present example, we compute two values of z as follows: 


Z \ 


25 


- (505 - 475) 

(IQ ) 2 , Of 

20 25 


-1.89 


The probability we seek, then, is 


Z 2 


35 


- (505 - 475) 

(IQ ) 2 .Of 

20 25 


P(- 1.89 < z < 1.89) = P(z < 1.89) - P(z < -1.89) 
= 0.9706 - 0.0294 = 0.9412 


( 5 . 5 . 1 ) 


1.89 


In practice, the following problems often arise: (1) the need to sample from a 
nonnormally distributed population, and (2) the need to sample from a population 
of unknown functional form. We solve these problems by taking large samples. 
When the sample sizes are large, the central limit theorem applies. The distribution 
of the difference between two sample means is then approximately normally dis¬ 
tributed with a mean equal to /x, — (i 2 and a variance of {a 2 Jn A ) + (cr|/n 2 ). In 
order to find the probabilities associated with specific values of the statistic, then, 


we proceed just as we do when sampling from populations that are normally 
distributed. 

EXAMPLE 5.5.2 Two methods of performing a certain task in a manufacturing plant, 
Method A and Method B, are under study. The variable of interest is length of 
time needed to perform the task. It is known that cr^ is 9 min 2 and or| is 12 min 2 . 
A simple random sample of 35 employees performed the task by Method A. An 
independent simple random sample of 35 employees, similar in all important 
aspects to the first group, performed the task by Method B. The average time the 
first group needed to complete the task was 25 min. The average time for the 
second group was 23 min. What is the probability of a difference (x A - * B ) this 
large or larger if there is no difference in the true average lengths of time needed 
for the task? 

Since the functional form of the population is not specified, and since the sample 
sizes are large (greater than 30), we can use the central limit theorem. We compute 



Table C shows that the area to the right of z = 2.58 is 0.0049. Thus a difference 
between sample means as large as the one observed in this case is rather rare 
when the population means are equal. 

5 . 5.1 A market analyst studying the length of time shoppers spend in two types of grocery 
store observes a sample of 75 shoppers in each store. The mean time the sample of shoppers 
spend in Store A is 55 minutes. The mean time the sample of shoppers spend in Store B 
is 49 minutes. What is the probability of observing a sample difference ( x A - x B ) at least 
as large as this if there is no difference in the true mean time shoppers spend in the two 
stores and if the standard deviation is 15 minutes for both populations? What assumptions 
are made regarding the samples? 

5 . 5.2 An accountant for a department store is studying the characteristics of customers 
who have charge accounts. A customer may choose between two types of account, A or 
B. A simple random sample of 50 customers with type A charge accounts has a mean age 
of 38 years. The mean age of an independent simple random sample of 50 customers with 
type B accounts is 33 years. If both populations have the same mean age and a standard 
deviation of 10 years, what is the probability of drawing two samples with a difference 
(x A - * B ) in means at least as large as the one this accountant observed? 

5 . 5.3 Scores on a motor-performance test for employees who hold nonsedentary jobs 
(group 1) are normally distributed, with a mean and variance of 60 and 100, respectively. 
Scores for employees who hold sedentary jobs (group 2) are normally distributed with a 
mean of 50 and a variance of 121. A random sample of 10 employees is selected from 
group 1. An independent random sample of size 11 is selected from group 2. What is the 
probability that the difference between sample means (x 1 - x 2 ) is between 8 and 14? 

5 . 5.4 Researchers have determined that hostility scores among blue-collar workers are 
approximately normally distributed, with a variance of 400 for both high school dropouts 









and those who have finished high school. Random samples of 15 dropouts and 20 high 
school graduates yielded sample means of x d = 77.50 and x g = 62.75, respectively. 
Assume that there is no difference between the population means. What is the probability 
of obtaining sample results (x d — x g ) as extreme as, or more extreme than, what was 
observed in these samples? 


5.6 DISTRIBUTION OF THE SAMPLE PROPORTION 

Sections 5.4 and 5.5 discussed the sampling distributions of measured variables. 
However, we often want to know the sampling distribution of statistics that arise 
from data that consist of counts. An example of such a statistic is the sample 
proportion, which is a special case of the sample mean. Suppose that we know 
that in some population the proportion of elements with a particular characteristic 
is p. We are often interested in finding the probability of observing in a sample 
of size n from this population a proportion of elements with the characteristic of 
interest as extreme as or more extreme than some specified value p 0 . To do this 
we need to know the properties of the sampling distribution of the sample pro¬ 
portion p. 

This problem is related to the problems in Chapter 4 that we solved by means 
of the binomial distribution. Those problems involved determining the probability 
of observing a certain number of elements with some characteristic in a sample 
of size n from a population in which a proportion p of the elements had that 
characteristic. Here we are interested in the proportion, rather than the number, 
that have the characteristic of interest. The two problems are related, since the 
sample proportion is equal to the number in the sample with the characteristic 
divided by the sample size. 

When the entities in a population can assume only one of two values, we usually 
call them success and failure. Suppose that we give the value 1 to an element in 
the sample that has the characteristic of interest (success), and the value 0 to an 
element in the sample that does not have the characteristic (failure). We compute 
the sample proportion p by 



The numerator of Equation 5.6.1 is merely a count of the elements with the 
characteristic of interest (successes, or Is). For example, suppose that in a simple 
random sample of 10 secretaries, 4 are married. The proportion married is given 
by 

„ £*, 4 

P n ~ 10 

Observe the similarity between Equation 5.6.1 and the formula for the sample 
mean. In fact, since the sample proportion is a special case of the sample mean, 



it is not surprising that there are similarities between the sampling distributions 
of p and x. 

We can construct the sampling distribution of a sample proportion experimen¬ 
tally. We use the same method we used to construct the sampling distributions of 
the arithmetic mean and the difference between two means. From the population, 
assumed to be finite, we take all possible random samples of a given size. For 
each sample, we compute the sample proportion p. We then prepare a frequency 
distribution of p. Table 5.6.1 shows the results if the samples are drawn without 
replacement. We can summarize the characteristics of the sampling distribution 
presented in Table 5.6.1 as follows. 

When the sample size is large, the distribution of sample proportions is ap¬ 
proximately normally distributed by virtue of the central limit theorem. The 
mean of the distribution —that is, the average of all the possible sample 
proportions—is equal to the true population proportion p. The variance of the 
distribution crj is equal to [p( 1 - p)]/n. 

The sampling distribution of p is only approximated by a normal distribution. 
For this approximation to be achieved, n must be large. A common rule of thumb 
states that we should use the approximation only when the smaller of np and 
n( 1 — p) is greater than 5. For other suggestions, see Cochran (1977). 

These results are known as the normal approximation to the binomial , which 
we discussed in Chapter 4. 

There we noted that when sampling is without replacement from a finite pop¬ 
ulation in which the variable can assume only one of two values, the actual 
sampling distribution is the hypergeometric distribution, not the binomial distri¬ 
bution. The binomial distribution results only when sampling is from an infinite 
population or from a finite population with replacement. The reason is that it is 
only under these circumstances that p remains constant from draw to draw. When 
the population is very large, we can use the binomial distribution to approximate 
the hypergeometric distribution when sampling is without replacement under the 
conditions specified in Section 4.3. In turn, we can use the normal distribution to 
approximate the binomial distribution. In dealing with proportions, we assume in 
this book that the binomial distribution is an appropriate approximation. 


TABLE 5.6.1 
Sampling 
distribution of a 
sample proportion 


Sample size 
n\ — n 2 — • • • = n (N\ 

Sample (nj Sample proportion 


1 m Pi 

2 n 2 p 2 

3 n 3 p 3 






EXAMPLE 5.6.1 A manufacturer of nails has found that 3% of the nails produced 
are defective. Suppose that a random sample of 300 nails is examined. What is 
the probability that the proportion defective is between 0.02 and 0.035? 

Since (0.03)(300) = 9 is greater than 5, we may use the normal approximation. 
We conclude that p is approximately normally distributed with a mean of = 
0.03 and a variance of = /?(1 — p)/n. We can transform any value of p to a 
value of the standard normal distribution using the following modification of a 
now-familiar formula: 


fen -P) 
n 


Applying this formula, we obtain the following two values of z: 


z, = 


0.02 - 0.03 
/(0.03)(0.97) 


= - 1.02 


Zo = 


0.035 - 0.03 
/(0.03)(0.97) 


= 0.51 


\ 300 y 300 

The probability we seek, then, is 

P(— 1.02 < z < 0.51) = P(z < 0.51) - P(z < -1.02) 
= 0.6950 - 0.1539 = 0.5411 


(5.6.2) 


which is found by referring to Table C. 

Just as with the sampling distributions of 3c, you should use the finite population 
correction factor in computing <x? when n is more than 5% of N. When the finite 
population correction factor has to be used, 

7 p( 1 - p) N - n 

(j^ — --- 

p n N - 1 


5 . 6.1 An accounting firm has found that 60% of its clients’ customers respond to initial 
requests for confirmation of their account balances. A simple random sample of 24 cus¬ 
tomers is sent requests for confirmation. What is the probability that 50% or more respond? 

5 . 6.2 Suppose that we know that 5% of the forms processed by a clerical pool contain at 
least one error. If we examine a simple random sample of 475 forms, what is the probability 
that the proportion containing at least one error is between 0.03 and 0.075? 

5 . 6.3 It is known that 25% of the people who saw a certain television program thought 
it contained too much violence. A random sample of 200 is selected from this population. 
What is the probability that the proportion in the sample with this opinion is between 0.24 
and 0.28? 

5 . 6.4 An advertising agency claims that 20% of the members of a certain adult population 
have never heard a certain slogan created by the agency. In a random sample of 100 adults 
drawn from this population, 24 said they had never heard the slogan. If the agency’s claim 
is true, what is the probability of obtaining results as large as or larger than those found 
in this sample? 



5.7 DISTRIBUTION OF THE DIFFERENCE BETWEEN 
TWO SAMPLE PROPORTIONS 

There are often two population proportions of interest and we wish to determine 
the probability associated with an observed difference between two sample pro¬ 
portions, where we draw an independent sample from each of the two populations. 
The relevant sampling distribution is the sampling distribution of the difference 
between two proportions. We can describe it as follows. 

Suppose that independent random samples of size and n 2 are drawn from 
two populations, where the proportions of observations with the characteristic 
of interest in the two populations are and p 2 , respectively. When n-, and n 2 
are large, the distribution of the difference between sample proportions 
Pt - p 2 is approximately normal, with mean 

, -> Pi(1 - Pi) p 2 0 ~ P?) 

M/S,-* = p, - p 2 and variance a\ = - L — - — + — - - 

n 1 n 2 

To answer probability questions about the difference between two sample pro¬ 
portions, we transform values of p x - p 2 to values of the standard normal distri¬ 
bution using the following formula: 

(Pi - P 2 ) - (pi - p 2 ) 

/ Pi(l - Pi) | P2U ~ P2) 

V «1 «2> 

To construct the sampling distribution of the difference between two proportions 
for finite populations, we follow the same procedure used for the construction of 
the sampling distribution of the difference between two sample means. 


EXAMPLE 5.7.1 It is claimed that 30% of the households in Community A and 
20% of the households in Community B have at least one teenager. A simple 
random sample of 100 households from each community yields the following 
results: p A = 0.34, p B = 0.13. What is the probability of observing a difference 
this large or larger if the claims are true? 

We assume that if the claims are true, the sampling distribution of p A — p B is 
approximately normally distributed, with a mean of 


/ Va -h = °- 3 - °- 2 = 01 


and a variance of 


cr 


Pa-Pb 


(0.3X0.7) (02 X0 - 8 ) = 0 . 003 , 


100 


100 


The observed difference in sample proportions is 

Pa ~ Pb = 0.34 - 0.13 = 0.21 





















Exercises 


Summary 


The probability we want is represented by the area to the right of 0.21 in the 
sampling distribution of p A - p B . To find this area, we compute 

= 0.21 - 0.10 0.11 

2 VO.0037 0.06 

and consult Table C. We find that the area to the right of z — 1.83 is 0.0336. 
Thus, if the claim is true, the probability of observing a difference as large or 
larger than that actually observed is 0.0336. 






5 . 7.1 A research group states that 16% of the firms of type A increased their market 
research budgets in the past five years. For type B firms, the figure was 9%. (a) What are 
the mean and standard deviation of the sampling distribution of the difference between 
sample proportions, based on independent simple random samples of 100 firms of each 
type? (b) What proportion of the sample differences (p A - p B ) would be between 0.05 
and 0.10? (c) Suppose that you took a simple random sample of size 100 from each 
industry. What is the probability that the difference you would observe would be equal to 
or less than 0.02? 

5 . 7.2 In a certain community, it is felt that 40% of the householders prefer a grocery store 
of a particular chain. In another community, it is felt that only 14% of the householders 
prefer a store of this chain. If these figures are correct, what is the probability that simple 
random samples of 100 from each community would yield a difference (p x - p 2 ) in the 
proportion of householders preferring this type of store of 0.42 or more? 

5 . 7.3 In a population of executives (Population A), 35% say that when they fly they prefer 
a certain airline. In another population of executives (Population B), 50% prefer this airline. 
What is the probability that a simple random sample of size 100 from each population 
would yield a difference (p Q - p A ) of 0.30 or more? 

5 . 7.4 A realtor claims that 40% of the homes in a certain neighborhood are appraised at 
$100,000 or more. A random sample of 75 homes from this area and 90 homes from 
another area yielded a difference in proportion of homes appraised at $100,000 or more, 
Pi - ^ 2 * °f 0.09. Suppose that there is actually no difference between the two population 
proportions. What is the probability of observing a difference p\ - p 2 this large or larger? 


This chapter is the most important one in the book. Unless you understand the 
concepts presented here, you can never truly understand statistical inference. For 
this reason, you should review this chapter carefully before you proceed further. 
Clear up now any points that you don’t understand. Remember that knowing how 
to get correct answers to exercises does not mean that you understand the concepts 
they illustrate. The why of the exercises is just as important as the how. 

This chapter introduced you to the concept of a probability sample. The statis¬ 
tical inference procedures discussed in the rest of this book depend for their 
validity on the assumption that the samples being analyzed are probability samples. 
You should now be familiar with the simple random sample, one of several kinds 
of probability samples. In Chapters 6 and 7, we assume that the samples on which 
our inferences are based are simple random samples. 



Review Questions 


The main concern of this chapter is sampling distributions. You should know 
what a sampling distribution is, and how to construct one from a small finite 
population. And, for the sampling distributions discussed, you should know the 
mean, the standard error, and the functional form. If you do not yet know these 
things, study this chapter some more before you begin Chapter 6. 

Another important concept that you should now understand is the central limit 
theorem. 

Finally, in this chapter you learned the characteristics of the sampling distri¬ 
butions of four statistics: the mean, the difference between two means, a propor¬ 
tion, and the difference between two proportions. You will encounter these sam¬ 
pling distributions again in Chapters 6 and 7. 

1. What are the two types of sampling? 

2 . Why is nonprobability sampling not covered in this text? 

3. Define or explain the following terms: (a) probability sample, (b) simple random 
sample, (c) sampling with replacement, (d) sampling without replacement, (e) sampling 
distribution. 

4 . Explain how to construct a sampling distribution from a finite population. 

5. Describe the sampling distribution of the sample mean when sampling is with replace¬ 
ment from a normally distributed population. 

6. Explain the central limit theorem. 

7. How does the sampling distribution of the sample mean when sampling is without 
replacement differ from the sampling distribution obtained when sampling is with replace¬ 
ment? 

8. Describe the sampling distribution of the difference between two sample means. 

9. Describe the sampling distribution of the sample proportion when large samples are 
drawn. 

10 . Describe the sampling distribution of the difference between two sample means when 
large samples are drawn. 

11 . Explain the procedure you would follow in constructing the sampling distribution of 
the difference between sample proportions based on large samples from finite populations. 

12 . Using a table of random numbers, from some real population of at least 100 obser¬ 
vations, draw a simple random sample of size 10. Present your sample results according 
to the format of Table 5.2.2. 

13. A population has a mean and standard deviation of 32 and 12, respectively. Consider 
the sampling distribution of the sample mean based on simple random samples of size 64. 
(a) What are the mean and standard deviation of the sampling distribution? (b) What 
proportion of sample means is between 30 and 35? (c) What is the probability that the 
mean of a single sample is greater than 35? (d) Less than 30? (e) Draw a picture of the 
sampling distribution and label the areas and points of interest as specified in (a) through 
(d). (f) What assumption is made about the sample specified in (c)? 

14 . Suppose that two normally distributed populations have the following parameters: 
Population I, /x = 60, cr = 8. Population II, ju, = 50, cr = 10. A simple random sample 
of size 16 from population I and an independent simple random sample of size 20 from 
population II yield means of 61 and 45, respectively. What is the probability of observing 
a difference (T r - J ir ) this large or larger between the sample means? 




















15. Suppose that 10% of all employees of a certain type who are fired during a certain 
period are fired because they violated company policy. In a simple random sample of 100 
of these discharged employees, what is the probability that the proportion fired for violation 
of company policy is 0.15 or more? 

16. Given the following population proportions: p x = 0.6, p 2 = 0.5. (a) Describe the 
sampling distribution of the difference (p { - p 2 ) between sample proportions when n l = 
100 and n 2 = 50. (Assume that the sample sizes are small relative to the population sizes.) 
(b) What proportion of sample differences would be between 0.03 and 0.24? (c) What is 
the probability that samples of these sizes would yield a difference in sample proportions 
greater than 0.31? 

17 . The mean time city bus drivers require to complete a round trip via Route 1 is 80 
minutes with a standard deviation of 3 minutes. For Route 2, the mean and standard 
deviation are 75 and 2, respectively. What is the probability that a random sample of 40 
trips from Route 1 and an independent sample of 50 trips from Route 2 yield a difference 
(x { - x 2 ) between sample means of 6 or more? 

18. A random sample of 50 reports by householders in city 1 yields a mean monthly 
utilities payment of $180. An independent random sample of 45 reports of householders 
in city 2 yields a mean monthly payment of $175. Suppose that there is no difference in 
the true mean monthly utilities payment for the two cities. What is the probability of 
observing a difference between sample means (jc, — x 2 ) as large as or larger than $5? 
Assume that a 2 = 225 for both cities. 

19 . The mean yield per acre of a certain grain in one locality is 100 lb. The yield in 
another locality is 75 lb. Suppose that yields per acre in the two localities are normally 
distributed, with a standard deviation of 20 lb. What is the probability that random and 
independent samples of 20 acres from each locality will yield a difference (Tj - x 2 ) 
between sample means of 10 or less? 

20. Physical fitness scores of a certain population of executives are normally distributed 
with a mean and standard deviation of 75 and 10, respectively. What is the probability 
that a random sample of 25 such executives has a mean score between 70 and 78? 

21. The mean number of years of experience of a certain population of salespersons is 10 
years. The standard deviation is 3 years. What is the probability that a random sample of 
81 of these salespersons yields a mean greater than 10 years and 8 months? 

22. In a certain town, 18% of the teenage boys regularly ride a motorcycle. A random 
sample of 100 teenage boys is selected from this town. What is the probability that between 
15 and 25% are regular motorcycle riders? 

23. A study of a certain suburb reveals that 70% of the families moved into the town 
during the past 5 years. A random sample of 200 families is drawn from this town. What 
is the probability that the proportion who have moved into the town within the past 5 years 
is between 0.65 and 0.75? What is the probability that the proportion is greater than 0.75? 

24. Of the 1150 middle managers in a certain area, 60% hold M.B.A. degrees. You select 
a random sample of 150 of them. What is the probability that the proportion in the sample 
with M.B.A. degrees is between 0.50 and 0.65? 

25. It is believed that 0.16 of the households in Metropolitan Area I have at least one 
preschool child. The proportion in Metropolitan Area II is believed to be 0.11. If these 
figures are accurate, what is the probability that a random sample of 200 households from 
Area I and an independent simple random sample of 225 households from Area II will 
yield a difference in sample proportions (p l - p n ) as large as or larger than 0.10? 






26. In a random sample of 150 children from Area A, 45 report that they regularly eat a 
certain breakfast cereal. In an independent random sample of 200 children from Area B, 
20 say that they regularly eat the cereal. Suppose that the proportion who regularly eat the 
cereal is actually 0.15 in each population. What is the probability of observing sample 
results (p A - p B ) this extreme or more extreme? 

27. Two drugs, A and B, are believed to be equally effective in preventing insomnia. The 
proportion of persons with whom the drugs are effective is believed to be 0.70. In a 
random sample of 100 persons who are given Drug A, 75 experience relief. Drug B is 
effective with 105 of an independent random sample of 150 subjects. Suppose that the 
two drugs are, in fact, equally effective, as believed. What is the probability of observing 
a value of p A - p B as large as or larger than that reported here? 

28. It is believed that 15% of the members of Population A have tried a certain brand of 
shampoo, but only 8% of the people in Population B have tried it. Suppose that these 
figures are accurate. What is the probability that a random sample of 120 people from 
Population A and an independent random sample of 130 people from Population B will 
yield a value of p A - p Q equal to or greater than 0.16? 

29. The weights of a certain kind of steel product are approximately normally distributed 
with a mean of 2800 lb and a variance of 9000 lb 2 . Suppose that a random sample of size 
10 is to be selected from this population. What is the probability that the mean weight of 
the sample is 2750 lb or less? 

30. The mean weight gain of a certain breed of dog fed a given puppy chow for a year 
is 24.5 lb with a standard deviation of 4.5 lb. (a) What are the mean and standard deviation 
of the sampling distribution of the sample mean, based on random samples of size 81 from 
this population? (b) A random sample of 81 puppies is fed the chow for a year. What is 
the probability that the mean weight gain in this sample is greater than 25 lb? 

31. From Table D, select a simple random sample of 30 one-digit numbers. Compute the 
mean and variance for your sample. Using your results and those of the other students in 
your class, construct a frequency distribution of the sample means. Plot the distribution 
as a histogram. Compute the mean and variance of the sample means obtained by the 
class. Compare your results with the mean and variance of the true sampling distribution 
based on samples of size 30. They are ju T = 4.5 and erf = 8.25/30 = 0.275. 

32. For a population consisting of 1000 employees, the variable of interest is number of 
days of accrued annual leave. Suppose that the mean and variance are 12 and 144, respectively. 
What is the probability that a sample of size 100 will yield a mean greater than 9? 

33. What is the probability that a sample of size 225 will yield a mean of 30 or less, 
given that the sampled population has a mean of 35 and a standard deviation of 30? The 
population size is 1500. 








6. Statistical Inference I: 
Estimation 


Chapter Objectives: This chapter discusses estimation, 
one of the two kinds of statistical inference procedures. 
(We shall discuss the other type—hypothesis testing- 
in Chapter 7.) There are two kinds of estimation: point 
estimation and interval estimation. Interval estimation 
is the more useful of the two. This chapter also gives 
you a chance to use what you learned earlier about 
probability, probability distributions, and sampling distri¬ 
butions. After studying this chapter and working the ex¬ 
ercises, you will be able to do the following. 

1. Define statistical inference 

2. Discuss the properties of a good estimator 

3. Construct confidence intervals for the following pa¬ 
rameters: (a) A population mean, (b) a population 
proportion, (c) a population variance, (d) the differ¬ 
ence between two population means, (e) the differ¬ 
ence between two population proportions, (f) the ra¬ 
tio of two population variances, and (g) a mean of a 
population of paired differences 

4. Determine how large a sample to draw from a popu¬ 
lation when the objective is to estimate either a pop¬ 
ulation mean or a population proportion 

5. Describe the t distribution and discuss when its use is 
appropriate 

6. Choose the correct reliability factor (z or t) for con¬ 
structing a confidence interval 

7. Explain the difference between probabilistic and prac¬ 
tical interpretations of a confidence interval 



6.1 INTRODUCTION 


Sampled 
Populations and 
Target Populations 


We discussed the foundations of statistical inference in Chapters 3, 4, and 5, 
which were concerned with the concepts of probability, probability distributions, 
and sampling distributions. 

In Chapter 2 we said that the motivation for analyzing data was the desire for 
insight into the nature of the data at hand. We computed a mean and a variance 
in order to describe a given set of data. Any conclusions we reached related only 
to those data. The calculation of a mean and variance (and standard deviation) 
takes on a new dimension in the area of statistical inference. Our interest now 
centers on what these measures can tell us about some larger body of data. Sta¬ 
tistical inference, then, is defined as follows. 

Statistical inference is the procedure whereby inferences about a population 
are made on the basis of the results obtained from a sample drawn from that 
population. 

There are several reasons why you may want to draw and analyze a sample in 
order to reach a decision about a population. A population may be so large that 
examining it in its entirety would demand prohibitive amounts of money, time, 
or resources. Or the process of taking a measurement may be destructive. Con¬ 
sider, for example, a manufacturer of light bulbs who wants information on the 
average lifetime of the product. The impracticality of testing every element (light 
bulb) of the population is obvious. The manufacturer can get the desired infor¬ 
mation only by means of sampling and inference. 

There are two types of statistical inference: (1) estimation and (2) hypothesis 
testing. We shall discuss the first of these in this chapter. We shall present hy¬ 
pothesis testing in Chapter 7. 

Before we begin our discussion of estimation, consider the following examples 
of situations in which we might use sampling for the purpose of making inferences. 

1. A firm is considering the establishment of a mobile home park in a certain 
area. The firm needs to know the average monthly rental fees for mobile homes 
in order to reach a decision on whether or not to develop the park. 

2. The manager of a retail grocery chain is interested in knowing what proportion 
of the customers in a given week are regular shoppers at the chain’s stores. 

3. The personnel manager of a large organization wishes to know the average 
age of the employees. 

4. An advertising executive wants to know what proportion of subscribers to a 
certain magazine remember a particular ad. 

5. An employment agency wishes to know the average salary being paid to people 
employed in a certain job classification. 

In applying statistical inference, you must know the difference between the sam¬ 
pled population and the target population. The sampled population is the popu¬ 
lation from which we actually draw the sample. The target population is the 
population about which we want information. The two may or may not be the 
















Unbiasedness 


same. Statistical inference, properly used, lets us make inferences about a (prop¬ 
erly) sampled population. Statistical procedures do not help us to reach decisions 
about a target population if that target population is different from the sampled 
one. 

Suppose that you wish to know what percentage of the households in a certain 
city have central air conditioning. Someone might propose that you select a simple 
random sample of households from the telephone book and base an inference 
about the population of households in the city on the information provided by this 
sample. You should ask yourself whether, in this case, the target population 
(households in the city) and the proposed sampled population (households listed 
in the phone book) are the same. A little reflection will probably convince you 
that they are not. What about households without phones? What about households 
with unlisted phone numbers? 

In many situations the target population and the sampled population are the 
same. Then inferences about the target population are straightforward. You should, 
however, be aware that they may be different so that you do not fall into the trap 
of making unwarranted inferences about a population that is different from the 
one that you sampled. 


6.2 PROPERTIES OF GOOD ESTIMATORS 

We can distinguish two types of estimate: point estimates and interval estimates. 

A point estimate is computed from the data of a sample. It consists of a single 
value (of a statistic) used as the best conjecture as to what the corresponding 
population value (parameter) may be. 

We shall define an interval estimate in Section 6.3. 

Note that an estimate is a specific numerical value. We make a distinction 
between an estimate and an estimator. An estimator is the procedure or rule, 
usually expressed as a mathematical formula, that tells how an estimate is com¬ 
puted. An example of an estimator is 



n 


which is the estimator used to obtain an estimate of a population mean. 

One aspect of point estimators has to do with whether a particular estimator is 
good or poor. Estimators are usually judged on the basis of the following criteria: 
(1) unbiasedness, (2) consistency, (3) efficiency, and (4) sufficiency. A rigorous 
treatment of these criteria is beyond the mathematical level of this text. However, 
a brief discussion will be of value. (If you want to explore the topic more fully, 
see the mathematical statistics texts cited in Chapter 5.) 

An estimator is said to be an unbiased estimator of a population parameter if the 
mean value of the statistic computed from all possible simple random samples of 
a given size drawn from that population is equal to the corresponding parameter. 



That is, if the expected value of the statistic is equal to the parameter. If 0 is the 
parameter being estimated and 6 (read “theta hat”) is an unbiased estimator of 
0 , we express this fact symbolically as 

E{&) = e 

The left-hand term reads, “the expected value of 8.” The sample arithmetic mean 
x is an unbiased estimator of the population mean /jl, since E(x) = /jl. In Chapter 
5 the example showing the construction of the sampling distribution of the sample 
mean, illustrated this fact. 

The sample variance, computed by 

S;’ = i (x, - x) 2 
n 


is not an unbiased estimator of or 2 . The sample variance calculated by this formula 
serves only as a measure of dispersion for the sample data. When we want to use 
sample data to compute an unbiased estimate of the population variance, we alter 
the estimator slightly. We divide the sum of the squared deviations of the values 
from their mean by n — 1 rather than n. To illustrate the fact that 

2 _ £(*,- - -y) 2 

S ” n - 1 


provides an unbiased estimate of the population variance, refer to the sampling 
distribution of x that we constructed in Section 5.4. The population consisted of 
the values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. The following measure of dispersion was 
computed from these data: 


<7" 


2(x,- - /x) 2 
N 


8.25 


If we compute, for each of the samples shown in Table 5.4.1, the sample variance 


5 


2 


S(jc f - - x) 2 

n - 1 


we obtain the sample variances shown in Table 6.2.1. 

Consider first the case in which sampling is with replacement. To obtain the 
expected value of s 2 , we find the mean of all the sample variances in Table 6.2.1. 
That is, 


„ \2 

E(s) = W = 


0 + 0.5 + 


100 


+ 0 825 


100 


= 8.25 


For this example, when sampling is with replacement, E(s) 2 = or 2 and s 2 is an 
unbiased estimator of cr 2 . 

Now consider the case in which sampling is without replacement. If we ignore 
the order in which the elements were selected in obtaining the samples, we get 
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TABLE 6.2.1 
Sample variances 
computed from the 
samples shown in 
Table 5.4.1 


First 

Second draw 


draw 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

0 

0.5 

2 

4.5 

8 

12.5 

18 

24.5 

32 

40.5 

2 

0.5 

0 

0.5 

2 

4.5 

8 

12.5 

18 

24.5 

32 

3 

2 

0.5 

0 

0.5 

2 

. 

4.5 

8 

12.5 

18 

24.5 

4 

4.5 

2 

0.5 

0 

0.5 

2 

4.5 

8 

12.5 

18 

5 

8 

4.5 

2 

0.5 

0 

0.5 

2 

4.5 

8 

12.5 

6 

12.5 

8 

4.5 i 

2 

0.5 

0 

0.5 

2 

4.5 

8 

7 

18 

12.5 

_ 

8 

4.5 | 

2 

0.5 

0 

0.5 

2 

4.5 

8 

24.5 

| 18 

12.5 

8 

4.5 

2 

0.5 

0 

0.5 

2 

9 

32 

24.5 

18 

12.5 

8 

4.5 

2 

0.5 

0 

0.5 

10 

40.5 

32 

24.5 

18 

12.5 

8 

4.5 

2 

0.5 

0 


the expected value of s 2 by computing the mean of the 45 variances above (or 
below) the principal diagonal. Thus 


E(s 2 ) 



0.5 + 2 + • • • + 0.5 _ 412.5 
45 ~ 45 


9.17 


which is not equal to or 2 . 

We may ask whether there is some parameter to which E(s 2 ) is equal. The 
answer is yes. If we define the population variance slightly differently, we find a 
parameter to which E(s 2 ) is equal. Let us define the population variance as 


S 2 = 


£(*,- - M) 2 
N - 1 


( 6 . 2 . 1 ) 


In other words, we find the measure of dispersion defined by S 2 by dividing the 
sum of squared deviations (of observations from their mean) by N — 1 rather 
than by N. 

When we compute S 2 for the population described in Example 5.4.1, we have 

2 = (3 - 5.5) 2 + (6 - 5.5) 2 + • y; + (10 - 5.5) 2 

10 - 1 )A 


Thus we see that, for this example, when sampling is without replacement, 
E(s 2 ) - 5 2 . 







Consistency 


Efficiency 


Again, s 2 is an unbiased estimator of the population variance if the latter is 
defined as S 2 . We obtain the same results when we consider all 90 of the off- 
diagonal sample variances in Table 6.2.1. 

These results are examples of general principles that can be proved mathemat¬ 
ically. They can be summarized as follows: 

E(s 2 ) = cr 2 when sampling is with replacement 

E(s 2 ) = S 2 when sampling is without replacement from a finite population 

where 


2 __ 2 (*/ ~ X) 2 
S ~ n - 1 



and 


S ’ 2 


Xfe - fJL ) 2 
N - 1 


These results justify using s 2 = 2(jt,- — x) 2 /{n - 1) to compute the sample 
variance. 

The denominator (n — 1) is called the degrees of freedom. We can explain the 
concept of degrees of freedom as it applies to the calculation of the sample variance 
on an intuitive basis as follows: Since the sum of the deviations (jt f . — x) must 
add to 0, only n — 1 of these deviations are independent. The last deviation is 
automatically specified when n - 1 of them are known. Thus we say that only 
n - 1 degrees of freedom are available for estimating the population variance. 
You will encounter the concept of degrees of freedom many times in this text. 
For a more general, and more rigorous, discussion of this concept, see the articles 
by Walker (1940) and Good (1973). 

When N is large, N — 1 and N will be approximately equal. Consequently a 2 
and S 2 will also be approximately equal. However, although s 2 is an unbiased 
estimator of <r 2 , s is not an unbiased estimator of cr. The bias diminishes rapidly 
as n increases. Cureton (1968) and Gurland and Tripathi (1971) discuss this point 
in more detail. 


if 








A statistic is said to be a consistent estimator if, as the sample size increases, the 
estimator approaches the population parameter being estimated. 

We have seen that the variance of 3c, <r 2 , is equal to a 2 In. This indicates that 
as n increases, we can expect the sample means to be closer to /jl. Consequently 
x is a consistent estimator of (i . 

It can be proved mathematically [Cramer (1946)] that an estimator is consistent 
if it is unbiased and if its variance approaches 0 as the sample size approaches 
infinity. The estimator x fulfills these conditions. As already pointed out, it is 
unbiased, and certainly cr 2 /n approaches 0 as n approaches infinity. It can be 
shown, also, that s 2 is a consistent estimator of cr 2 . 

The efficiency of an estimator depends on its variance: One estimator is more 
efficient than another if the variance of the former is less than the variance of the 
latter in repeated sampling. We can compute a measure of relative efficiency by 
forming the ratio of the variances of two estimators. In general, the relative 
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Sufficiency 


efficiency of an unbiased estimator 0, with respect to another unbiased estimator 
0 2 is given by 

Variance (0 2 ) 

Variance (0,) 

As an example, compare the sample mean with the sample median for efficiency. 
We know that the variance of 3c is or 2 /n. It can be shown [see Wilks (1947)] that 
the variance of the median (which, like the mean, is unbiased) is approximately 
equal to tt<j 2 jin. 

The efficiency of the median relative to the mean, then, is 

Variance (3t) cr 2 /n 2 

——:---——- = - T~~ = — = 0.64 

Variance (median) 7T(r z /2n tt 

This shows that, for the same sample size n, the variance of x is less than the 
variance of the median. Hence x is a more efficient estimator than the median. 
(This discussion of efficiency assumes that the estimators are unbiased.) 

An estimator is said to be sufficient if it utilizes, about the parameter being esti¬ 
mated, all the information contained in the sample. Admittedly, this is a rather 
vague statement. To be more specific, however, would require a more complex 
mathematical explanation than is desirable in this text. It is important to remember 
that if a sufficient estimator exists, it is useless to consider any other nonsufficient 
estimator. The sufficient estimator has exhausted all the information in the sample 
relevant to the estimation of the parameter of interest. The sample mean 3c and 
sample proportion p are sufficient estimators, respectively, of pi and p. 


63 THE POPULATION MEAN-KNOWN POPULATION VARIANCE 

In contrast to a point estimate, an interval estimate consists of an interval that we 
are willing to say, with varying degrees of conviction, contains the parameter 
being estimated. We may obtain both “one-sided” and “two-sided” interval 
estimates. However, since two-sided intervals are more often used, we shall con¬ 
sider only these. The bounds of a two-sided interval consist of two possible values 
of the parameter being estimated. It is a characteristic of point estimates that no 
statement of confidence can be attached to them. As we will show later, this is 
not true of interval estimates. We can obtain an interval to satisfy any degree of 
confidence that the interval does contain the parameter of interest. 

To consider the most extreme case, we may say with 100% confidence that the 
unknown mean of some population is contained in the interval to + oc. The 
uselessness of such an interval is obvious. Fortunately, we can obtain much nar¬ 
rower and, therefore, more useful intervals. The price we pay for a more useful 
interval is a reduction in confidence that it contains the parameter being estimated. 

Since we can attach a statement of confidence to each interval estimate we 



obtain, we can refer to interval estimates as confidence intervals and to the bounds 
of the interval as confidence limits. 

To obtain a useful interval estimate, we draw on our knowledge of sampling 
distributions. For example, if we want to obtain an interval estimate of a popu¬ 
lation mean, we recall what we know about the sampling distribution of x, the 
estimator of the population mean. Chapter 5 showed that when sampling is from 
a normally distributed population, the sampling distribution of x, is normally dis¬ 
tributed with mean fx- = fx and standard deviation cr- — cr/^/Ti. Knowing that 
x is normally distributed lets us make further statements about the distribution of 
x. For example, approximately 95% of all the values of x are within two standard 
deviations of the mean fx, regardless of its numerical value. In other words, the 
interval bounded by /x — 2 cr T and /x + 2cr T has fx as its center and contains 
approximately 95% of all values of x. 

Suppose we drew a sketch of the sampling distribution of jt, showing the points 
that are two standard deviations from the mean. In any practical situation this 
would not be feasible, since fx , and hence /would be unknown. With fx^ 
unknown, we would not know where on the x axis to center the distribution. In 
the general case, however, we can sketch the distribution as in Figure 6.3.1. 

In a practical situation, the expression /x ± 2 cj t . is not by itself informative, 
since jx is unknown. But if fx is replaced by its estimators, the picture changes 
completely. In x ± 2<r- we have an interval estimate of /x. Furthermore, the 
nature of this interval is such that we can attach to it a statement of our degree 
of confidence that it actually contains the unknown parameter fx. We can change 
our degree of confidence simply by changing the value of the numerical coefficient 
accompanying <j t . 

As you try to understand this confidence interval, consider the situation in which 
a confidence interval of the form x, ± 2cr- is computed for every possible value 
of x. (If the population is infinite, imagine computing a large number of these 
confidence intervals.) The result would be a large number of intervals, all with 
widths equal to the width of the interval about the unknown /x . The centers of 
95% of these intervals would fall within the interval about fx. Each of these 
intervals, therefore, would contain fx. Figure 6,3.2 shows the concept. It shows 
that X ], jc 3 , and x 4 all fall within the 2c r- interval about /x. Thus the 2<x- intervals 
about these sample means “cover,” or include, fx. The sample means x 2 and x 5 


FIGURE 6.3.1 
Sampling 
distribution of X, 
showing p* - 2a* 
and jjl* +2o- y 



6.3 The Population Mean-Known Population Variance 
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FIGURE 6.3.2 
Sampling 
distribution of X, 
showing several 
confidence 
intervals for jut 



do not fall within the 2 <t x . interval about /jl. Therefore the 2(7* intervals about these 
sample means do not include /i. 

Let us examine the composition of the interval estimate, x ± 2d-. The center 
of this interval is x, the point estimator of fi. The 2 is a value from the standard 
normal distribution that indicates within how many standard errors of the mean 
lie approximately 95% of the possible values of x. This value of z is referred to 
as the reliability coefficient. The value of the reliability coefficient depends on the 
value of the confidence coefficient, which specifies the degree of confidence that 
we can attach to the interval estimate. If the reliability coefficient is 2, the level 
of confidence is approximately 95%, and the confidence coefficient is approxi¬ 
mately 0.95. In general, the confidence coefficient is equal to 1 - a, where a is 
the area under the curve of the sampling distribution of x that lies outside the 
interval about the unknown /jl. This means that 1 - a is equal to the area under 
the curve that is included in the interval about /jl. See Figure 6.3.2. 

The last component of the interval estimate, cr^, is the standard error, or standard 
deviation of the estimator x. In general, we may express a two-sided interval 
estimate as follows: 

Estimate ± (reliability factor) X (standard error) ( 6 . 3 . 1 ) 

In particular, when sampling is from a normal distribution with a known variance, 
an interval estimate for jjl is given by 


X ± Z\- a /2Vj 


( 6 . 3 .:?) 
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Statistical Inference I: Estimation 


Interpreting the 
Confidence Interval 


FIGURE 6.3.3 
Standard normal 
distribution, 
showing z y _ OJ2 for 
a = 0.05 


where z, _ a/2 is the value of z to the right of which lies a/2 of the area under the 
standard normal curve. Figure 6.3.3 shows z x _ a/2 for the situation in which 
a = 0.05. Table C shows that when a = 0.05, z,_ ft/2 = z 0 975 = 1.96. That 
is, 0.975 of the area under the curve is to the left of 1.96. 

We can interpret the confidence interval in one of two ways. The first, called the 
probabilistic interpretation , is based on the probability of occurrence of intervals 
about x that include p. It may be stated as follows. 

In repeated sampling from a normally distributed population, 100(1 - a)% 
of all intervals of the form x ± z x _ a/2 crx that may be constructed from 
simple random samples of size n will, in the long run, include the population 
mean p. 

The other interpretation, called the practical interpretation, may be stated as 
follows. 

We are 100(1 - a)% confident that the single interval x ± Zi_ a / 2 oj? f computed 
from a simple random sample of size n from a normally distributed population, 
contains the population mean p. 

If we have constructed a 95% confidence interval, for example, that means that 
we are 95% confident that this single interval contains the population mean. We 
can make this statement because we know that 95% of all possible intervals 
constructed in this manner will contain p. We do not know, however, whether 
or not any given interval contains p unless we know the true value of p. In 
constructing a 95% confidence interval, we usually prefer the value of z that has 
to its right an area closer to 0.025 than is the area to the right of z = 2. Table C 
shows the appropriate value of z to be 1.96. We may use any confidence coefficient 
we wish when we construct a confidence interval. Those most commonly used 
are 0.90, 0.95, and 0.99, for which the reliability factors are 1.645, 1.96, and 
2.58, respectively. 

The following example illustrates the construction of a confidence interval. (In 
this chapter we will assume, unless otherwise indicated, that the population is 
large enough relative to the sample so that the finite population correction can be 
ignored.) 



EXAMPLE 6.3.1 You are the quality-control supervisor for a wire manufacturing 
company. Periodically you select a sample of wire specimens to test for breaking 
strength. Experience has shown that the breaking strengths of a certain type of 
wire are normally distributed with a standard deviation of 200 lb. A random sample 
of 16 specimens yields a mean of 6200 lb. You want a 95% confidence interval 
for the mean breaking strength of the population. 

The point estimate of /jl is x = 6200 lb, the z value corresponding to a confi¬ 
dence coefficient of 0.95 is 1.96, and the standard error of the estimate is 
cr/y/n = 200/VT6 = 50. The population from which the sample was drawn is 
normally distributed. Thus you can use Equation 6.3.2 to obtain 

6200 ± 1.96(50), 6102, 6298 

You are 95% confident that the population mean is contained in the interval 6102 
to 6298. You can make this statement because you know that, in repeated sam¬ 
pling, 95% of the intervals that you can construct in this manner will, in the long 
run, contain the population mean. 


This procedure for constructing the confidence interval for a population mean 
applies as long as the departure from normality is not too severe. If the sampled 
population deviates substantially from a normal distribution, we must take a further 
precaution, which we shall discuss later. 

In Equations 6.3.1 and 6.3.2, we refer to the product we obtain by multiplying 
the reliability factor by the standard error as the precision of the estimator. That 
is, 


Precision = (reliability factor)(standard error) (6.3.3) 


Different situations require different levels of precision. In one case in which the 
mean, expressed in dollars, is the parameter we wish to estimate, we might want 
the precision to be within 1 dollar. In another case, a precision of 5 dollars might 
suffice. 

A high level of confidence leads to a large reliability factor. A large reliability 
factor, when multiplied by a given standard error, yields a large product, which 
indicates low precision. High precision is indicated by a small numerical value of 
the product (reliability factor)(standard error). Remember that the standard error 
is equal to the population standard deviation divided by the square root of the 
sample size. For fixed values of both the reliability factor and the population 
standard deviation, the precision can be high or low, depending on the sample 
size. Larger samples result in higher precision. Smaller samples result in lower 
precision. For example, let the confidence coefficient be 0.95 (the reliability factor 
is 1.96), let (7 = 30, and let n = 25. The precision is 


Precision = 1.96 



1.96(6) = 11.76 


Now suppose we keep the same confidence coefficient and population cr, but 
increase the sample size to n — 100. We then have 




Precision = 1.96 I = 1.96(3) = 5.88 


Confidence 
Intervals for Means 
of Nonnormally 
Distributed 
Populations 


In many, perhaps most, practical cases, it is neither possible nor wise for you to 
assume that the population of interest is normally distributed, or even approxi¬ 
mately normally distributed. This is not a problem as far as the use of Formula 
6.3.2 is concerned, as long as it is possible to take a large sample. We are 
interested in the sampling distribution. We know that the sampling distribution of 
x is approximately normally distributed regardless of the functional form of the 
sampled population, provided that the sample size is large. This result is based 
on the use of the central limit theorem (recall Chapter 5). 


EXAMPLE 6.3.2 A counselor for an employment agency draws a simple random 
sample of previous applicants in order to estimate the mean score all previous 
applicants made on an aptitude test. The sample, which consists of 150 applicants, 
yields a mean score of 3c = 68. The counselor knows from experience that the 
population variance is 100. The counselor also has evidence from experience that 
the population is not normally distributed. A 99% confidence interval for p is 
desired. 

Here we need to ask about the size of the population, to see whether we should 
use the finite population correction factor mentioned in Chapter 5. Suppose that 
the population consists of 1000 previous applicants. The sample then contains 
more than 5% of the population, and we must use the correction factor. The 
sample size is large, the central limit theorem applies, and we may use Formula 
6.3.2, with cr x adjusted by the finite population correction factor to obtain the 
following interval: 


2,58 y 150 y iooo 


58(0.82)(0.92), 66.05, 69.95 


We say that we are 99% confident that the population mean is contained in the 
interval 66.05 to 69.95. But now suppose that the population had been much 
larger, say N = 10,000. We would not have needed the finite population correc¬ 
tion factor, and the interval would have been: 

68 ± 2.58(0.82), 65.88, 70.12 


Exercises 



6.3.1 A frozen food company wishes to know the mean length of ears of com received 
in a large shipment. A random sample of 200 is collected and the ears measured. The 
arithmetic mean of the lengths is found to be 8.8 in. The population has a standard deviation 
of 1.5 in. What are the 95% confidence limits for /a? 

6.3.2 A telephone answering service, at the end of each call, completes a report in which 
the length of the call is recorded. A simple random sample of 9 reports yields a mean 
length of call of 1.2 minutes. Construct the 99% confidence interval for the population 
mean. It is known that the population is normally distributed with a standard deviation of 
0.6 minute. 





6.3.3 The quality-control supervisor of a large manufacturing firm wishes to estimate the 
mean weight of 5500 packages of raw material. A simple random sample of 250 packages 
yields a mean of 65 lb. The population standard deviation is 15 lb. Construct a confidence 
interval for the unknown population mean fi. Assume that a 95% confidence interval is 
satisfactory. 

6.3.4 For males between the ages of 17 and 21, a physical fitness research team wishes 
to estimate the mean consumption of oxygen after a standard set of exercises. Previous 
research has indicated that the population variance is 0.0512. A random sample of 25 
subjects yields the following results, in liters per minute: 2.87, 2.05, 2.90, 2.41, 2.93, 
2.94, 2.26, 2.21, 2.20, 2.88, 2.51, 2.51, 2.56, 2.59, 2.52, 2.51, 2.50, 2.58, 2.52, 2.58, 
2.44, 2.48, 2.43, 2.46, 2.46. Assume that the variable of interest is normally distributed. 
Obtain a 95% confidence interval for the population mean. 

6.3.5 You are an industrial psychologist who wants to estimate the mean age of a certain 
population of female employees. You draw a random sample of 60 females from the 
population. The sample yields a mean age of 23.67 years. You know that the population 
of ages is not normally distributed and that the population standard deviation is 15 years. 
Construet a 99% confidence interval. 


Student's t 
Distribution 


6.4 THE POPULATION MEAN—UNKNOWN POPULATION VARIANCE 


The procedures in Section 6.3 for constructing a confidence interval for a popu¬ 
lation mean depend on knowing the numerical value of the population variance. 
But often you don’t know the value of a population variance. Nor the value of 
the mean. In the typical situation, both will be unknown. 

When we do not know the population variance, we cannot use Formula 6.3.2 
to construct a confidence interval for /x, because we need cr to compute ay == 
cr/V/i. In that case, we compute the sample standard deviation s and use it to 
estimate cr, a procedure that leads to the following estimate of oy: 

S K . = j/Vil (6.4.1) 


We may now substitute s y for cr T in the formula for the confidence interval for fi. 
This procedure, however, does not completely solve the problem. The reliability 
factor z in the formula is no longer available, since it is obtained from the relation 


x - th = x - Px 

oy ( t /\/ 7 i 


(6.4.:?) 


and z is normally distributed. In other words, we can no longer determine the 
appropriate z value accurately, since cr is unknown. 


When we use the estimate of ay, .sy , in Equation 6.4.2, the resulting variable is 



which we use in place of z in the confidence interval for /x. 


1 



The next problem is to obtain a numerical value for t in a specific situation. 
We must consider the nature of the distribution of t. In other words, suppose that 
we take a very large number of samples of size n from a normally distributed 
population and use the mean and standard deviation of each to compute a value 
of t. The problem relates to the manner in which these values of t would be 
distributed. 

The nature of the distribution of 

= * z El 

s/s/n 

was first investigated and reported by William Sealy Gosset (1876-1937). [See 
Gosset (1908).] Gosset published under the pseudonym “Student.” Consequently 
the distribution of t is frequently referred to as Student's distribution . The prop¬ 
erties of this distribution are as follows. 

1. It has a mean of 0. 

2. It is symmetrical about its mean. 

3. In general, it has a variance greater than 1, but the variance approaches 1 as 
the sample size increases. 

4. The variable t takes on values between — and <*. 

5. The t distribution is really a family of distributions, since there is a different 
distribution for each degrees-of-freedom value. In the one-sample case, this is 
n — 1 , the divisor used in computing s 2 . 

6 . In general, the t distribution is less peaked at the center and higher in the tails 
than the normal distribution. 

7. The t distribution approaches the normal distribution as n increases. 

Figure 6.4.1 compares the t distribution and the normal distribution. 

The t distribution, like the standard normal distribution, has been extensively 
tabulated. Table E gives one such table. To use it, we need to know the value of 
the confidence coefficient and the degrees of freedom. 

The general procedure for constructing confidence intervals for /jl is not affected 
by the fact that we must obtain the reliability coefficient from the t table rather 
than from the z table. We still use the fact that we can express a confidence 
interval by the general relationship 

Estimate ± (reliability factor) x (standard error) 

To be more specific, when we are sampling from a normally distributed population 
with unknown a, the 100(1 — a)% confidence interval for the population mean 
is given by the following expression (in which 6-«/2 is the reliability factor): 

X ± t l _ a/2 s/\ r n. (6.4.4) 

In theory, we should use this formula only when sampling is from a normally 
distributed population. Experience has shown, however, that moderate departures 
from this assumption do not appreciably affect the results. Therefore the t distri- 



FIGURE 6.4.1 
The standard 
normal distribution 
and the t 
distribution 



bution is widely used even when it is known that the sampled population is not 
normally distributed. Most researchers require that the distribution of the popu¬ 
lation be at least mound-shaped. In Chapter 11, we shall discuss a procedure for 
determining whether or not it is likely that a population is normally distributed. 
The following example shows the use of the t distribution. 

EXAMPLE 6.4.1 In an effort to establish a standard time needed to perform a certain 
task, a production engineer randomly selects 16 experienced employees to perform 
the task. The mean time required by the 16 employees is 13 minutes. The standard 
deviation is 3 minutes. The production engineer wishes to construct a 95% con¬ 
fidence interval for the true mean length of time required to perform the task. 

If we can assume that these 16 measurements constitute a simple random sample 
from a normally distributed population of times required to perform the task, we 
can use Formula 6.4.4. Assume that these assumptions are reasonable. The point 
estimate is 13 minutes. The standard error of the estimate is s/y/n = 3/16 == 
0.75. To find the reliability factor, we enter Table E with n - 1 = 15 degrees 
of freedom. The column containing the appropriate value of t is the one labeled 
t 0 975 , since this is the one that contains t values with 0.025 of the area under the 
curve to their right. (The negatives of these values have 0.025 of the area under 
the curve to their left.) The appropriate t value is 2.1315. The desired interval is 

13 ± 2.1315(0.75), 11.4, 14.6 

From the evidence this sample provides, we say that we are 95% confident that 
the true mean time required to perform the task is between 11.4 and 14.6 minutes. 
We can say this because of the probabilistic interpretation that says that, in re¬ 
peated sampling, 95%; of the intervals that can be constructed in the same manner 
include the true mean. 


Confidence 
Intervals for Means 
of Nonnormally 
Distributed 
Populations — 
Unknown 
Population 
Variance 


We have said that, if we are to use Formula 6.4.4, the distribution of the sampled 
population must not deviate too much from a normal distribution. When sampling 
is from a nonnormally distributed population, the central limit theorem guarantees 
at least an approximately normally distributed sampling distribution of x when the 
sample size is large. We said earlier that when you are sampling from a nonnor¬ 
mally distributed population, you should draw a large sample and use Formula 
6.3.2. But Formula 6.3.2 requires a knowledge of <r, which is unknown under 
conditions of the type we are now discussing. Again we may use the sample 



standard deviation to estimate a and use the resulting modification of Formula 
6.3.2, which is as follows: 


Paired 

Observations 


( 6 . 4 . 5 ) 


We can do this because we assume that, since the sample size is large, it provides 
an adequate estimate of cr. In fact, many people use Formula 6.4.5 when a is 
unknown and n is large, whether or not they assume the population to be normally 
distributed. In other words, the size of the sample, rather than whether a is known 
or unknown, is often used as the criterion for determining whether to use z or t 
as the reliability factor when constructing confidence intervals. This is reasonable, 
because the t distribution approaches the standard normal distribution as the size 
of the sample increases. 


EXAMPLE 6.4.2 A real estate firm wants to develop a new shopping center. It 
wishes to know the average size of grocery stores in existing shopping centers. 
The firm’s researchers are unwilling to assume that the population of store sizes 
is normally distributed. To obtain an interval estimate of p, therefore, they decide 
to take a large sample. Then the central limit theorem will apply, and they can 
use the standard normal distribution to obtain their reliability factor. The popu¬ 
lation standard deviation is unknown, but they feel that a large sample will yield 
a satisfactory estimate of or. The researchers draw a random sample of 50 grocery 
stores. They find the sample mean and standard deviation to be 10,000 and 4800 
square feet, respectively. Under the circumstances, the appropriate formula for 
constructing a confidence interval for p is Formula 6.4.5. Using this formula 
gives the following 95% confidence interval: 


10,000 ± 1.96 


/4800\ 

\V50/ 


8669; 11,331 


A special case of statistical inference about a single population mean occurs when 
the data for analysis consist of paired observations. We can generate paired ob¬ 
servations in a variety of ways. We can take measurements on subjects before 
and after some intervening treatment, environmental alteration, and so forth. For 
example, we might gather individual production data on assembly-line employees 
before and after the initiation of measures designed to improve their working 
conditions. In the laboratory, we may divide individual specimens of material into 
two parts and subject each half to a different experimental procedure. For example, 
we can record tensile strengths for batches of plastic prepared according to the 
same formula, except that the two halves of each batch receive different amounts 
of a key ingredient. Another way to obtain paired observations is to match pairs 
of subjects according to as many relevant characteristics as possible. Then you 
apply one treatment to one member of each pair and a different treatment to the 
other member. For example, we pair salespersons by matching according to age, 
years of experience, education, level of initiative, and so forth. Then we assign 
one member of each pair to a training course taught by one method and the other 
member to a training course taught by another method. 


In such situations, we are interested in the difference between the results pro¬ 
duced under the two different conditions. We may, for example, want to determine 
the extent to which a change in working conditions causes a change in production 
volume. For individual employees, we can compare the number of units produced 
in a week before the change with the number produced during a week after the 
change. From the difference between these two observations, we can tell whether 
an individual employee’s weekly production has increased, decreased, or remained 
the same. To assess the overall effect of the change, we examine the difference 
in production for the group as a whole. A measure that seems particularly relevant 
is the mean of the individual differences between before and after production 
figures. Indeed, this is a measure of considerable interest in analyzing paired 
observations. 

Let d denote the difference between two paired observations and x 2 . We may 
define the sample mean of the differences (mean difference) as 



where n is the number of pairs of observations. The sample variance, denoted by 
s d , may be computed as follows: 

, 2(4 - d) 2 nld 2 - fLdf 2 

S7, = - = -:- (6.4.7) 

n — 1 n(n — 1) 

We can use the sample mean difference d as a point estimator of the population 
mean difference fi d . When the population of differences is normally distributed 
with unknown variance, a 100(1 — a)% confidence interval for jjl (J is given by 

d ± /j - n /2 /- (6.4.8) 

vn 

where s d /\ / n is the estimated standard error of the mean difference. The degrees 
of freedom for t are n — 1. If the central limit theorem is applicable, we can use 
z in place of t in Equation 6.4.8. 


EXAMPLE 6.4.3 A simple random sample of 10 electronics firms are asked in a 
questionnaire to state the amount of money spent on employee training programs 
during the year just ended and during a year a decade ago. Table 6.4.1 gives the 
results (adjusted for inflation). 

We wish to construct a 95% confidence interval for the mean difference in 
expenditures for employee training programs by the 10 firms. 


TABLE 6.4.1 
Amount of money 
spent on employee 
training by 10 
firms (x$1000) 


Firm 

06 ) 

Decade ago (X 2 ) 
d f 


ABCDEFGH / J 
12 14 8 12 8 10 8 9 10 10 

10 11 8 7 9 6 10 9 7 9 

2305 -1 4 -2 03 1 



[u [y 


From the last column of Table 6.4.1, we compute 


d 

s d 

S ci 


2 + 3 + 


10 




10(2 2 + 3 2 + • - • + l 2 ) - (15)' 


(10)(9) 


= V5T7 = 2.3 


2.3 

VK) 


= 0.73 


The 95% confidence interval for fi d is 

1.5 ± 2.2622(0.73), -0.2, 3.2 


Exercises 




6.4.1 In a study to determine the feasibility of using a flexible plastic hose on a certain 
piece of machinery, engineers want to estimate the mean pressure to which the hose will 
be subjected. They take nine pressure readings randomly throughout a 24-hour period of 
operation. The sample mean and standard deviation are 362 and 45, respectively. Assume 
that pressure readings are approximately normally distributed. Construct the 99% confi¬ 
dence interval for the true mean pressure. 

6.4.2 A simple random sample of 16 radio stations is selected in order to estimate the 
average charge for the same fixed-length spot announcement. The sample mean and standard 
deviation are $15.50 and $8.00, respectively. Assume that the charges made by all radio 
stations of the type sampled are approximately normally distributed. Construct the 95% 
confidence interval for the population mean. 

6.4.3 A record club wishes to know the average age of its members. A random sample 
of 100 members yields a mean age of 26 years with a standard deviation of 5 years. 
Assume that the population of ages is not normally distributed. Find the 95% confidence 
interval for /x. 

6.4.4 A soft-drink manufacturer wants to know the extent of customer preference for 
twist-off resealable tops for 32-ounce bottles. To investigate this, the manufacturer sets 
up a study in which the regular 32-ounce bottles are replaced with bottles with twist-off 
resealable tops in 16 randomly selected supermarkets in a certain area for a period of one 
month. The sales volume for each store for that month is compared with the same store’s 
sales volume for the preceding month. The results (in hundreds of bottles) are as shown 
in the table. Construct a 95% confidence interval for \x d based on d t = x u - x 2i . 


Store # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

Test month (XP 44 61 46 55 49 50 45 64 40 62 53 54 57 55 61 52 

Preceding month (X 2 ) 47 62 56 39 45 51 56 47 48 40 44 52 51 60 61 57 


6.5 THE DIFFERENCE BETWEEN TWO POPULATION MEANS- 
KNOWN POPULATION VARIANCES 

We often want to know the difference between two population means, fi x — ju 2 . 
In the absence of direct knowledge of this difference, we estimate it from sample 
data. Chapter 5 showed that 3: i - x 2 is an unbiased estimator of /x, - (jl 2 . It also 



showed that when two populations are normally distributed, the sampling distri¬ 
bution of 3c, — x 2 , computed from independent random samples , is normally 
distributed, with a standard error given by 



We can construct a 100(1 — a)% confidence interval for — p, 2 , then, by 

Icr] rn 

(*1 “ * 2 ) ± Z \- a /2 J — + — ( 6 - 5 '0 

V n \ n 2 

This applies when the sampled populations are normally distributed, the population 
variances are known, and the samples are randomly and independently drawn 
from the two populations. Here is an example of the construction of such an 
interval. 

EXAMPLE 6.5.1 A manufacturer produces a synthetic fiber at two factories located 
in different parts of the country. Every effort is made to maintain uniformity of 
production between the two factories with respect to the mean breaking strength 
of the fiber. To determine whether or not the two factories are maintaining uni¬ 
formity of production, the manufacturer selects a sample of 25 specimens from 
Factory 1 and a sample of 16 specimens from Factory 2. The mean breaking 
strength of the sample from Factory 1 is 22 lb. The mean of the sample from 
Factory 2 is 20 lb. The variance in both factories is known to be 10 lb 2 . The 
populations are normally distributed. The following 95% confidence interval is 
constructed using Formula 6.5.1: 

/To To 

(22 - 20) ± 1.96 /— + —, 0.0, 4.0 

The usual probabilistic and practical inteipretations may be given to this interval. 


Exercises 



6.5.1 A gasoline company wishes to compare credit-card holders in one area with those 
in another area. One question is how long the customers have held credit cards. A random 
sample of 100 card holders is selected from each area. The sample means are 156 months 
(Area 1) and 96 months (Area 2). The population variances for the two areas are 900 and 
700, respectively. Construct a 95% confidence interval for the difference between popu¬ 
lation means. 

6.5.2 A bank official who wants to know the difference between the average amount of 
money customers have on deposit in two branch banks selects a random sample of 25 
customers from each branch. The sample means are: Branch A: $450; Branch B: $325. 
The two populations are normally distributed with variances <j\ = 750 and = 850. 
(a) Construct a 95% confidence interval for p, A = ju B ; (b) construct a 99% confidence 
interval. 

6.5.3 A firm hires a team of psychologists to study the difference between the character¬ 
istics of employees who attended a special training course and those who did not. The 
researchers examine a simple random sample of 50 employees who attended the course 


JL 




and an independent simple random sample of 60 who did not. At the end of six months, 
they give the employees a job-satisfaction test. Those who attended the course had a mean 
score of 4.50. Those who did not had a mean score of 3.75. Construct a 95% confidence 
interval for the difference between population means. On the basis of experience, the 
researchers assume that the two populations are approximately normally distributed. The 
variances are 1.8 for the population that attended the course and 2.1 for the population 
that did not. 

6.5.4 A researcher wants to construct a 95% confidence interval for the difference between 
the mean IQs of two groups of employees. The two populations of IQ scores are approx¬ 
imately normally distributed with variances of erf = 100 and cr\ - 144. Independent 
simple random samples of sizes n x = 25 and n 2 = 16 yield the following results. 


Sample 1 

108 

111 

107 

120 

101 

102 

108 103 

119 124 

109 

101 

103 

114 

116 

128 

116 

111 

115 

115 

111 

129 

116 

126 

109 

Sample 2 

116 

117 

98 

94 89 

124 99 

114 

114 

99 

89 

110 

117 

99 

94 92 



6.5.5 A physical fitness counselor wishes to compare the muscular power of young ex¬ 
ecutives who exercise regularly with that of executives of the same age and sex who do 
not exercise regularly. Subjects consist of 40 randomly selected exercisers and 50 randomly 
selected nonexercisers. Mean muscular endurance scores: 17.35 (for exercisers) and 15.19 
(for nonexercisers). The counselor feels that these scores are normally distributed. Expe¬ 
rience shows that the variance for each group is about 2.25. The counselor would like to 
construct a 95% confidence interval for the difference between population means. 


6.6 THE DIFFERENCE BETWEEN TWO POPULATION MEANS- 
UNKNOWN POPULATION VARIANCES 


Normally 
Distributed 
Populations , 
Unknown but 
Equal Variances 


Section 6.5 explained how to construct a confidence interval for the difference 
between two population means when the sampled populations are both normally 
distributed and the two population variances are known. But in real life, conditions 
are usually quite different. Sampled populations may be nonnormally distributed 
and/or population variances may be unknown. Here we consider three possible 
situations. 

1. The populations are normally distributed, and the population variances are 
unknown but equal. 

2. The populations are normally distributed, and the population variances are 
unknown and unequal. 

3. The populations are not normally distributed, and the population variances are 
unknown. 

When we want to estimate the difference between two population means, we 
cannot use Formula 6.5.1 if the population variances are unknown. Suppose that 
the populations are normally distributed, the population variances are unknown, 
but known to be equal , and we draw independent random samples from each of 
the two populations of interest. We may then construct a confidence interval for 


Ijl 1 - /jl 2 by using the t distribution to obtain a reliability factor and by estimating 
the population variances from sample data. Chapter 7 gives a procedure for de¬ 
termining whether or not two population variances are likely to be equal. 

Say that the assumption of equal population variances is justified. The sample 
variances computed from the two samples will each be an estimate of the common 
variance <r 2 . We capitalize on this fact by pooling the two sample estimates to 
obtain a single estimate of cr 2 . To do this, we compute the weighted average of 
the two sample variances, where the weights are the degrees of freedom. If the 
sample sizes are equal, the weighted average is merely the arithmetic mean of the 
two sample variances. If, however, the two sample sizes are unequal, the weighted 
average takes advantage of the additional information provided by the larger sam¬ 
ple. The pooled estimate of the common cr 2 is given by 


2 = (w, - l)sf + (w 2 - 1>2 
p ti] + n 2 — 2 


( 6 . 6 . 1 ) 


The estimated standard error of the estimator, x A - x 2 , then, is 





It can be shown that 


(*! - *2) ~ (Mi - M2) 



follows the t distribution with n l + n 2 - 2 degrees of freedom. This justifies the 
use of t as a reliability factor in the confidence interval for fi ] ~ jx 2 under the 
conditions mentioned. 

In summary, we may state the following: 


When random and independent samples of size n } and n 2 , respectively, are 
drawn from two normally distributed populations with unknown but equal 
variances, the 100(1 - a)% confidence interval for ^ - fi 2 is given by 


(*i - *2) 


*1 -n/2 S p 



( 6 . 6 . 2 ) 


EXAMPLE 6.6.1 Experimenters test two types of fertilizer for possible use in the 
cultivation of cabbages. They grow the cabbages in two different fields. One of 
the two fertilizers is applied in each field. At harvest time they select a random 
sample of 25 cabbages from the crop grown with fertilizer. They randomly select 
12 cabbages from the crop grown with fertilizer II. The sample mean and variance 
of the weights of cabbages grown with fertilizer I are 44.1 oz and 36 oz 2 . The 
mean weight computed from the second sample is 31.7 oz, and the variance is 
44 oz 2 . The experimenters assume that the two populations of weights are normally 



Unequal 

Population 

Variances 


distributed. They also assume that the two population variances are equal. There¬ 
fore they compute the following pooled estimate of cr 2 : 


24(36) + 11(44) 
25 + 12-2 


864 + 484 
35 


= 38.51 


Formula 6.6.2 is used to compute the 95% confidence interval for fi l — fJL u . 

(44.1 - 31.7) ± 2.0301 V38.51 + + 8.0, 16.8 

When the population variances are unequal, we may not use the t distribution as 
previously outlined to construct confidence intervals for the difference between 
two means, even if the populations are normally distributed. This is a problem of 
no small consequence, since it seems reasonable that in many practical cases we 
cannot expect two population variances to be equal. 

A solution to the problem was proposed by Behrens (1929) and later verified 
and generalized by Fisher (1939, 1941). Neyman (1941), Scheffe (1943, 1944), 
and Welch (1937, 1947) have also offered solutions. The problem is also discussed 
by Aspin (1949), Trickett et al. (1956), and Cochran (1964). Cochran’s approach 
is included in Snedecor and Cochran (1980). 

The problem results from the fact that the quantity 

, = (*i ~ x 2 ) ~ (Mi ~ M 2 ) 


does not follow a t distribution with n, + n 2 — 2 degrees of freedom when the 
population variances are not equal. One way around the problem is to use a 
modified value for the degrees of freedom. Dixon and Massey (1969) give the 
following formula for the modified degrees of freedom: 


(sj/n^ + (.sj/» 2 ) 2 

«1 «2 - 

If the assumptions of normality hold, t' is distributed approximately as t with 
degrees of freedom given by Equation 6.6.3. Under these conditions, then, the 
100(1 - a)% confidence interval for - p 2 is given by 


(*l “ x 2 ) ± t' 


The numerical value of the modified degrees of freedom computed using Equation 
6.6.3 is not always an integer. If df' is a fraction, the closest value to df given 


in the table of the t distribution is usually satisfactory. We interpret a confidence 
interval obtained using Equation 6.6.4 in the usual manner, but we should re¬ 
member that the limits are only approximate. 

EXAMPLE 6.6.2 Refer to Example 6.6.1. Assume, for illustrative purposes, that 
the two population variances are not equal. In preparing to use Equation 6.6.4 to 
construct a confidence interval for /jl ] - /x 2 , first compute the degrees of freedom 
using Equation 6.6.3: 


(36/25) 2 (44/12) 2 

25 + 12 



The value of t for a confidence coefficient of 0.95 and 22 degrees of freedom is 
2.0739, so that by Equation 6.6.4, the approximate 95% confidence interval for 
Mi - M2 is 


(44.1 - 31.7) ± 2.0739 



7.7, 17.1 


Nonnormatly 
Distributed 
Populations and 
Unknown 
Variances 


When we seek a confidence interval for the difference between two means, we 
may find that not only are the two population variances unknown, but the popu¬ 
lations are not normally distributed. However, if and n 2 are both large, the 
central limit theorem applies, and we may use ^ and s 2 to estimate cr l and cr 2 . 
Under these circumstances, an approximate 100(1 - a)% confidence interval for 
- /jl 2 is given by 

+ | (66 ' s) 


Exercises PTTO 6.6.1 The difference in the ages of patients admitted to two different hospitals is of interest 
to an insurance company studying patterns of use of health facilities in a certain area. A 
simple random sample of discharge records is drawn from the files of each of the two 
hospitals, with the following results: For Hospital A: n = 22, x — 54.5, s 2 = 256. For 
Hospital B: n = 20, x = 39.4, s 1 = 200. It is felt that the two populations are approx¬ 
imately normally distributed, (a) Assume that the two population variances are equal. 
Construct the 95% confidence interval for /jl a - p B . (b) Assume that a A cr|. Construct 
the 95% confidence interval for fi A - p B . 

6.6.2 A telephone answering service completes a report on each call received, noting the 
length of the call. The answering service serves only two types of clients. A simple random 
sample of 225 records for the type A client and an independent, simple random sample of 
100 records for the type B client give the following means and variances for length of 
call: x A = 121.4 seconds, x B = 93.5 seconds, s A = 900, = 1200. Assume that the 

populations are not normally distributed with equal variances. Compute the 90% confidence 
interval for fx A - jjl b . 




6.6.3 In a certain factory, two machines are used to produce metal rods. A random sample 

of 11 rods from Machine A and a random sample of 21 rods from Machine B give these 
results with respect to the lengths of metal rods produced: x A = 5.95 in., x B — 6.01 in., 
s\ = 0.018, = 0.020. Assume that the populations are approximately normally dis¬ 

tributed. (a) Assume that the population variances are equal. Construct the 95% confidence 
interval for p A - /x B . (b) Do part (a) on the assumption that the two population variances 
are not equal. 

6.6.4 A research team wishes to know by how much, on the average, employees who 
have problems with alcohol and those who do not differ with respect to self-control. The 
researchers give a test designed to measure self-control to n x = 21 employees who do not 
have problems with alcohol and n 2 = 16 problem drinkers. The results are as follows: 
x x = 29.75, sj = 68.06, x 2 = 24.50, s\ = 47.61. The researchers believe that the test 
yields scores that are approximately normally distributed. The sample variances are not 
equal, but the researchers believe that the population variances are equal. Construct a 95% 
confidence interval for the difference between population means. 


6.7 INTERVAL ESTIMATION: THE POPULATION PROPORTION 

In many business situations, we want to know what proportion of items in a 
population have a certain characteristic. A firm may want to find out what pro¬ 
portion of its customers would respond favorably to a change in the way service 
is provided. A company may want to know what proportion of its employees have 
not completed high school. A manufacturer may want to know what proportion 
of rejected items are rejected because of defective material. 

To estimate a population proportion, we do the same things we did when we 
estimated a population mean. We randomly draw a sample of size n from the 
population of interest and compute the sample proportion p. We use this sample 
proportion as a point estimate of the population proportion. We obtain a confidence 
interval for the population proportion by the general formula. 

Estimate ± (reliability factor) X (standard error) 

Chapter 5 showed that when both np and n(\ - p) are greater than 5, the 
samplin g distribution of p is approximately normal with mean p and standard error 
cTp = Vp( 1 ~ p)/n. Since we are looking for an estimate of p, it must be un¬ 
known. Therefore in practical situations we use p rat her than p in the formula for 
the standard error. We obtain its estimate s p = V/)( 1 — p)/n. When we can 
consider the sampling distribution of p approximately normal, we can obtain the 
reliability factor in the formula for the confidence interval from the table of the 
standard normal distribution. As indicated in Chapter 5, if the sample constitutes 
more than 5% of the population, we should use the finite population correction 
factor. The examples and exercises that follow assume that n is sufficiently small 
relative to N, so that the correction factor is not needed. 

In summary, then, we may state the following: 


When np and n( 1 - p) are both greater than 5, and when n is small relative 
to the size of the population, the approximate 100(1 - a)% confidence interval 
for p is given by 


p ± 2i_ 


Ip 0 - p) 


a/2 


(6.7.1) 


This interval is given the usual probabilistic and practical interpretations. 


EXAMPLE 6.7.1 The personnel director of a large company, trying to find out what 
proportion of all persons who have ever been interviewed for any position have 
been hired, is willing to settle for a 95% confidence interval. A random sample 
of 500 interview records reveals that 76 (or 0.152) of the persons in the sample 
have been hired. 

The 95% confidence interval for the population proportion, by Equation 6.7.1, 
is 


0.152 ± 1.96 


f (0.152)(0.848) 


500 


0.152 ± 0.031, 0.121, 0.183 


The interpretation of this interval is the same as the interpretation made of the 
confidence interval for the arithmetic mean. In the long run, 95% of the intervals 
constructed in this manner will include the population proportion. We are therefore 
95% confident that the interval actually constructed contains the population pro¬ 
portion. 


6.7.1 A consultant for an association of personnel directors wants to find what proportion 
of clerical personnel who change jobs do so because they are bored with their work. The 
consultant queries a random sample of 400 clerical workers who recently changed jobs. 
Two hundred state that they changed jobs because they were bored. The consultant prepares 
a 95% confidence interval for the true proportion changing jobs because of boredom. What 
are the lower and upper limits of this interval? 

6.7.2 A company wishes to estimate the proportion of its employees who have read with 
retention a safety leaflet that was distributed last week to all employees. A random sample 
of 300 employees is given a test to measure retention of the contents of the leaflet. Of 
those tested, 75 make a passing score. Construct a 95% confidence interval for the true 
proportion retaining sufficient knowledge of the contents of the leaflet to make a passing 
score. 

6.7.3 In a study of the reasons for employee turnover, an investigator draws a sample of 
200 from a population of former employees of a firm. Of the 200, 140 say that they left 
because they couldn’t get along with their supervisors. Construct a 95% confidence interval 
estimate of the true proportion who left the firm for this reason. 

6.7.4 When you interview a random sample of 175 adults, 79 tell you that they feel that 
their community’s most pressing social problem is drug and alcohol abuse. Construct a 
95% confidence interval for the proportion in the population who hold that opinion. 





6.8 THE DIFFERENCE BETWEEN TWO POPULATION PROPORTIONS 


It is often worthwhile to have some idea of the magnitude of the difference between 
two population proportions. For example, we may wish to compare—with respect 
to some characteristic of interest—men and women, two age groups, two types 
of firms, two socioeconomic groups, or two factories. 

To estimate p x — p 2 , the difference between the population proportions, we 
draw a simple random sample from each of the populations and use p x - p 2 , the 
difference between the sample proportions. We may construct an interval estimate 
in the usual manner. 

Chapter 5 showed that when n x and n 2 are both large, and the population 
proportions are not too close to 0 or 1, the sampling distribution of p x - p 2 for 
independent samples is approximately normally distributed with mean p x — p 2 
and standard error 

/piO - Pi) . P2O ~ Pi) 

(Tp = - + - 

1 P2 yj n x n 2 

Since p x and p 2 are unknown, the standard error has to be estimated by 



Under the given conditions, then, an approximate 100(1 - a)% confidence in¬ 
terval for p, - p 2 is given by 



EXAMPLE 6.8.1 An ad agency conducts a survey to study the characteristics of 
subscribers to two newspapers. A random sample of 500 subscribers to Newspaper 
A reveals that 300 have annual incomes in excess of $50,000. In the case of 
Newspaper B, 200 out of a random sample of 500 subscribers have annual incomes 
in excess of $50,000. Construct a 95% confidence interval for the difference 
between the two proportions of subscribers with annual incomes in excess of 
$50,000. 

From the information given, we compute p A = 300/500 = 0.6 and p B = 
200/500 = 0.4. Substituting in Equation 6.8.1 gives the desired interval: 


(0.6 - 0.4) ± 1.96 


/(0.6)(1 ~ 0.6) (0.4)(1 - 0.4) 

yj 500 + 500 


0.2 +1.96(0.03), 0.14,0.26 


We are 95% confident that the true difference is between 0.14 and 0.26, because 
in the long run approximately 95% of the intervals constructed in this manner 
would include the true difference. 


Exercises 




6.8.1 Doctors who have developed a new drug for the treatment of a certain disease treat 
a group of 400 patients suffering from the disease with the new drug. They treat another 
group of 400 patients with an alternative drug. At the end of two weeks, 320 of the patients 
receiving the new drug recover, while 240 of those taking the alternative drug recover. 
Construct the 95% confidence interval for the difference between the true proportions of 
patients who might be expected to respond to the two drugs. 

6.8.2 A random sample of 350 salespersons and an independent random sample of 325 
executives are questioned about their reading habits. Of the 350 salespersons, 105 say that 
they subscribe to car magazines. Of the executives, 130 say they subscribe to car maga¬ 
zines. Construct the 90% confidence interval for the difference between the true proportions 
subscribing to car magazines. 

6.8.3 In a study of the types of errors made by employees in two factories owned by the 
same firm, researchers note the following facts. Construct a 95% confidence interval for 
Pa ~ Pii' 


Factory A B 

n 200 225 

Proportion of errors due 

to employee carelessness 0.32 0.25 



6.8.4 A random sample of 200 female clerical workers and an independent random sample 
of 200 male clerical workers participate in a study conducted by a psychologist. In this 
study, 32 of the males and 11 of the females exhibit an intense dislike for their jobs. 
Construct a 99% confidence interval for the difference between the two population pro¬ 
portions. 


6.9 DETERMINING SAMPLE SIZE FOR ESTIMATING MEANS 

Up to this point, the problems and exercises have specified the sample size being 
used. But they have not mentioned how a particular sample size was decided on. 
One needs a method for determining how large a sample to take. Suppose that 
we want to estimate, with a confidence interval, the mean of a population. One 
of the first questions to arise is: How large should the sample be? We must consider 
this question seriously. It is a waste of resources to take a larger sample than we 
need to achieve the desired results. Similarly, if the sample is too small, the 
results may be of no practical value. The key questions that bear on this problem 
are: 

1. What precision is desired? That is, how close do we want our estimate to be 
to the true value? In other words, how wide would we like to make the confidence 
interval that we want to construct? 

2. How much confidence do we want to place in our interval? That is, what 
confidence coefficient do we wish to employ? 




i 




These questions bring to mind the nature of the confidence interval that we will 
eventually construct. This interval will be of the form 


cr 

x ± z 7 = 
vn 

if we can ignore the finite population correction factor. The quantity 


cr 


is equal to one-half the confidence interval. If we can answer the first question, 
we can set up the following equation: 


d = Z —7= (6.9.1) 

vn 

where d indicates how close to the true mean we want our estimate to be. That 
is, d is equal to one-half the desired interval width. If we solve Equation 6.9.1 
for n , 

Z l CJ “ 

n = (6.9.2) 


Thus, if we can specify d, z, and cr 2 in advance, it is a simple matter to find n. 
We merely substitute the specified values into Equation 6.9.2. 

The value we specify for d varies from case to case. If we want a narrow 
interval, d will be small. The value of z depends on the level of confidence we 
want. And cr 2 depends on the variability present in the population of interest. As 
a general rule, cr 2 is unknown and has to be estimated. The most frequently used 
methods of estimating cr 2 are the following: 

1. We may use the variance computed from a pilot or preliminary sample, drawn 
from the population of interest, as an estimate of cr 2 . We may count observations 
used in the pilot sample as part of the final sample. Therefore the number of 
observations we need after drawing the pilot sample is equal to n — n { , where n 
is equal to the computed sample size and n x is the number of observations in the 
pilot sample. 

2. We may have estimates of cr 2 from previous or similar studies. 

3. If we feel that the population from which the sample is to be drawn is ap¬ 
proximately normally distributed, we may use the fact that the range is approxi¬ 
mately equal to 6 standard deviations to compute cr ~ R/ 6 . This method requires 
some knowledge of the smallest and largest values of the variable in the popula¬ 
tion. 

When sampling is without replacement from a finite population, the finite pop¬ 
ulation correction is appropriate. Equation 6.9.1 becomes 



(6.9 3) 


j _ cr jN — n 

" z V7iyl^r~l 

which, when solved for n, gives 

Nz 2 cr 2 

" = ¥{n - i) + 2V 2 

If we ignore the finite population correction, Equation 6.9.4 reduces to Equation 
6.9.2. 


EXAMPLE 6.9.1 An advertising firm wants to estimate the average amount of money 
a certain type of store spent on advertising during the past year. Experience has 
shown the population variance to be about 1,800,000. How large a sample should 
the advertising firm take in order for the estimate to be within $500 of the true 
mean with 95% confidence? 

Substituting the given data into Equation 6.9.2, we have 


(1.96) 2 (1,800,000) 
(500) 2 


27.65 * 28 


The advertising firm should take a sample of 28 establishments. (Note that n is 
always rounded up.) 


The Two-Sample Case By a straightforward extension of the method used to 
develop a formula for determining the sample size needed to estimate a single 
population mean, we also may derive a sample-size formula for the two-inde¬ 
pendent-sample case. We assume either that the two populations to be sampled 
are normally distributed or that the sample sizes will be large enough so that we 
can apply the central limit theorem. We also assume that the finite population 
correction factor is not needed. 

We again designate one-half of the desired interval width by d. We write 


d = 


cr 


(6.9.5) 


Squaring both sides of this equation yields 

d 2 = z 2 (— + — 
Vh n 2 


Let us assume that n l and n 2 are equal, that is, n x = n 2 = n. We have 

or nd 2 = z 2 (cr 2 4- a 2 ) 


d 2 _ „ 2 | Zl + °2 



When we solve for n, we have 


n 


(6.9.6) 


zHcr] + 0 - 1 ) 
d 2 


To find n for a given application, we need only specify d , the level of confidence 
desired (in order to determine z), and the values of cr 2 and cr\. The population 
variances, usually unknown, may be estimated by the usual methods. 

The following example illustrates the use of Formula 6.9.6. 


EXAMPLE 6.9.2 A researcher wishes to know whether the mean length of employ¬ 
ment with the current firm at time of retirement is different for men and women. 
The researcher would like to have a confidence-interval estimate of the difference 
between the two population means. The specifications are a confidence-interval 
width of 1 year and 95% confidence. Pilot samples yielded variances of 5 and 7. 
The researcher wants samples of equal size. What size sample should be drawn 
from each population? 

By Equation 6.9.6, we have 


n 


(1.96) 2 (5 + 7) 
(0.5) 2 


184.4 - 185 


We need a sample of 185 men and an independent sample of 185 women. 
Again note that d is equal to one-half the width of the desired confidence interval. 
Thus, since the desired confidence interval is one year, d is equal to 0.5 year. 

Pentico (1981) discusses other aspects of determining the sample size for the 
two-sample situation. 


Exercises 






6.9.1 A plastics firm wishes to estimate the mean impact strength of a spool. How many 
spools should the company test if it wishes to be within 20 psi of the true value with 99% 
confidence? Previous experience indicates that an acceptable estimate of cr 2 is 4900. 

6.9.2 A consultant for a chain of motels wants to estimate the average number of miles 
driven per day by families on vacation. The consultant obtains the names and addresses 
of vacationing families who stayed at motels in the chain during the past year. How large 
a sample should the consultant select in order to estimate the average daily mileage to 
within 25 miles with 95% confidence. It is felt that a reasonable estimate of cr 2 is 18,000. 

6.9.3 A researcher with a company that employs 2500 workers wishes to estimate the 
mean travel time between the company and the employees’ homes. The investigator wants 
a 99% confidence interval and an estimate that will be within 1 minute of the true mean. 
A small pilot sample yields a variance of 25 min 2 . What size sample should the researcher 
draw? 

6.9.4 A psychologist wants to construct an interval estimate of the mean IQ of a certain 
population of employees. The estimate is to be within 5 points of the true mean with 95% 
confidence. Previous experience indicates that the IQ’s for the population of interest are 
approximately normally distributed with a variance of 100. The psychologist wants to 
know how large a sample to draw from the population. 





6.9.5 A realtor wishes to estimate, for Areas A and B, the difference between the mean 
number of days elapsing between the time houses are placed on the market and the time 
they are sold. A confidence coefficient of 0.95 and an interval width of 10 days are desired. 
Pilot samples from the two areas yielded s A = 21 and s Q = 24. What size sample should 
be selected from each area {n ] = n 2 )? 

6.9.6 For two populations of cigarette smokers, a market research firm wishes to estimate 
the difference between the mean number of cigarettes smoked per week. A confidence 
coefficient of 0.99 and an interval width of 20 are desired. Estimates of the population 
variances are 225 and 250. What size sample should be drawn from each population 
(«i = n 2 V 


6.10 DETERMINING SAMPLE SIZE FOR ESTIMATING PROPORTIONS 

When we estimate a population proportion, we determine the sample size in about 
the same way as described above for estimating a population mean. We set half 
the desired interval d equal to the product of the reliability coefficient and the 
standard error. The assumption of random sampling and conditions warranting 
approximate normality of the distribution of p lead to the following formula for 
n, when sampling is with replacement or is from an infinite population: 

z 2 pq 

n = —y ( 6 . 10 . 1 ) 

where q — 1 — p. If sampling is without replacement, the proper formula for ft 
is 


Nz 2 pq 

n = - ,, - 7.2. ( 6 -i°. 2 ) 

d 2 (N - 1) + z 2 pq 

When N is large in comparison to n (that is, n/A < 0.05), we may ignore the 
finite population correction, and Equation 6.10.2 reduces to Equation 6.10.1. 

Both formulas require a knowledge of p , the proportion of elements in the 
population with the characteristic of interest. This is the parameter to be estimated. 
Obviously, it is unknown. Again, we may take a pilot sample and compute an 
estimate to use in place of p in the formula for n. Alternatively we may have 
some good notion of the likely value of p that we can use in the formula. For 
example, a personnel director who wants to estimate the proportion of employees 
who have not completed high school may feel that p is about 0.10. In the formula 
for ft, 0.10 would be used for p. 

If we can’t obtain a better estimate, we may set p equal to 0.5 in the formula 
for ft. This gives a sample of sufficient size for the desired reliability and interval 
width, since it yields the maximum sample size. It may be larger than we need, 
however, in which case the sample will be more expensive than it would have 
been had we had a better estimate of p available. Use this procedure only if you 
cannot obtain a better estimate of p. 


f 




Exercises 


EXAMPLE 6.10.1 A market research firm wants to estimate the proportion of house¬ 
holds in a certain area that have color television sets. The firm would like to 
estimate p to within 0.05 with 95% confidence. No estimate of p is available. 

Since no better estimate of p is available, we must use 0.5. When appropriate 
substitutions are made in Equation 6.10.1, we have 


(1.96) 2 (0.5)(0.5) 

(0.05) 2 


= 385 


The Two-Sample Case In a manner similar to that used to derive a sample-size 
formula for estimating the difference between two population means, we may 
derive a sample-size formula to estimate the difference between two population 
proportions. We assume that the populations to be sampled are large enough for 
us to apply the normal approximation to the binomial distribution. We also assume 
that the samples are of equal size. From the relationship 


d = z 


Ip i(i - P\) , Pi (i - Pi) 


we derive, when n 1 = n 2 = n , 

z 2 [p,(l - p,) + p 2 ( 1 - p 2 )] 


where d is, as usual, one-half the desired interval width. The parameters p Y and 
P 2 are estimated in the usual ways. 

EXAMPLE 6.10.2 For two populations of consumers, a researcher wants to estimate 
the difference between the proportions who have used a particular brand of coffee. 
A confidence coefficient of 0.95 and an interval width of 0.10 are desired. Esti¬ 
mates of p l and p 2 are 0.20 and 0.25, respectively. How large should the sample 
sizes be (n l = n 2 )? 

By Equation 6.10.4, we have 


1.96 2 [(0.20)(0.80) + (0.25)(0.75)] 
(0.05) 2 


533.9824 


The researcher should draw a sample of size 534 from each population. 


6.10.1 An urban university will offer Saturday classes if student demand is sufficiently 
high. What size sample of students would you poll in order to estimate with 95% confi¬ 
dence, and to within 0.05, the proportion of students who would register for Saturday 
classes, if offered? Assume that no estimate of p is available. 

6.10.2 A researcher in industrial medicine wants to determine what proportion of all shoe 
factories require that an employee provide a doctor’s certificate for three or more days’ 
absence for illness. What size sample should the researcher take in order to be within 0.05 
of the true value with 95% confidence? The researcher feels that the true proportion cannot 
be more than 0.30. 



6.10.3 A market-research analyst wishes to know how large a sample of the homes in a 
certain community to draw in order to find in what proportion of the homes at least one 
member has seen a certain newspaper ad. There are 500 homes in the community. The 
analyst wants to be within 0.04 of the true proportion with 90% confidence. In a pilot 
sample of 15 homes, 35% of the respondents indicate that someone in the home had seen 
the ad. How large a sample should be drawn? 

6.10.4 For two populations of drivers, an insurance executive wants to estimate the dif¬ 
ference between the proportions of those who regularly wear seat belts. A confidence 
coefficient of 0.95 and an interval width of 0.12 are desired. Estimates of and p 2 are 
0.25 and 0.18, respectively. How large should the samples be ( n x = n 2 )? 

6.10.5 For two populations of employees, an industrial psychologist wishes to estimate 
the difference between the population proportions who have been sexually harassed at their 
place of employment. A confidence coefficient of 0.90 and an interval width of 0.14 are 
desired. How many employees should be selected from each population (n, = n 2 )? Esti¬ 
mates of /?i and p 2 are 0.10 and 0.28, respectively. 

6.11 CONFIDENCE INTERVAL FOR THE VARIANCE 
OF A NORMALLY DISTRIBUTED POPULATION 

We often want to know the magnitude of a population variance. Manufacturers 
of household appliances, for example, want to know the variability in the quality 
of their product in order to establish warranty periods. A drug manufacturer, in 
order to prepare truthful advertising copy, needs to know the variability of patient 
response to a given drug. 

We usually do not know population variances. Consequently we must estimate 
them from sample data. In the typical case, interval estimates are more useful 
than point estimates. We usually base confidence intervals for cr 1 2 3 4 on the sampling 
distribution of (n — \)s 2 /a 2 . 

We can approximate the sampling distribution of ( n — \)s 2 /cr 2 empirically, 
using the following steps. 

1. Draw a large number of simple random samples of size /? from a normally 
distributed population with a known variance cr 2 . 

2. Compute s 2 from the data of each sample. 

3. Use each value of s 2 to compute ( n — \)s 2 /cr 2 . 

4. Construct the frequency distribution of ( n - \)s 2 /cr 2 . 

The resulting sampling distribution is an approximation of the sampling distri¬ 
bution of (n — 1 )s 2 /cr 2 . This is distributed as a well-known distribution called 
the chi-square distribution. If we were to graph this sampling distribution, it would 
resemble a chi-square distribution with n - 1 degrees of freedom. The larger the 
number of samples drawn, the closer the graph of the empirical sampling distri¬ 
bution is to the corresponding chi-square distribution. Computer simulation is a 
very effective method of constructing empirical sampling distributions of this type . 
We shall discuss the chi-square distribution, designated by the Greek letter x 2 , in 
greater detail in Chapter 11. 



Note that the chi-square distribution, unlike the normal and t distributions, is 
asymmetric. Like the t distribution, the chi-square distribution is a family of 
distributions. There is a different distribution for each possible value of degrees 
of freedom, n — 1 . 

Figures 6.11.1 and 6.11.2 show some chi-square distributions for several values 
of degrees of freedom. Table F gives percentiles of the chi-square distribution. 

To construct a 100(1 — a)% confidence interval for cr 2 , we first obtain an 
interval about (n - \)s 2 /cr 2 . We select two values of x 2 from Table F in such 
a way that a/2 is to the left of the smaller value and a/2 is to the right of the 
larger value. If we call these two values xl /2 an d X 2 -a/ 2 > respectively, the 
100(1 - a)% confidence interval for (n — 1 )s 2 /cr 2 is given by 

2 . (n - l)s 2 2 

X a/2 ^ g.2 ^ X l-a/2 (6.11.1) 


We can rewrite Equation 6.11.1 in such a way that we get an expression with cr 2 
alone as the middle term. First we divide each term by (n — l)s 2 to get 


Xlt/2 < J_ < X\-a/2 

(n — l )^' 2 cr 2 (n — l)s 2 


( 6 . 11 . 2 ) 


Taking the reciprocal of Equation 6.11.2 yields 


(n - 1 )s 2 ^ ^ ( n ~ 

- 2 - > (T Z > - 2 - 

X a/2 X 1 — a/2 


(6.11.3) 


FIGURE 6.11.1 
Chi-square 
distributions for 
selected degrees of 
freedom 






FIGURE 6.11.2 
Chi-square 
distributions for 
selected degrees of 
freedom 



Taking reciprocals changes the direction of the inequalities. If vve reverse the 
order of the terms, the result is 


(n - l)i 2 7 (n - 1)j 2 

—5— - < a 1 < -5- 

X l — a/2 X a/2 


(6.11.4) 


This is the 100(1 - a)% confidence interval for a 2 . Taking the square root of 
each term in Equation 6.11.4 yields the following 100(1 - a)% confidence in¬ 
terval for or, the population standard deviation: 


'(n - l)s 2 
X\-a/2 


< or < 



(6.11.5) 


EXAMPLE 6.11.1 As part of its quality-control program, a firm that makes wrought- 
iron sheets wants to estimate the variance of weight per square foot of its product . 
A random sample of 51 specimens yields a variance of 0.021 lb 2 . We want to 
find a 95% confidence interval for a 2 . 

The x 2 values for 50 degrees of freedom are x 2 \- a /2 ~ 71.420 and x 2 a /2 ~ 
32.357. When we substitute these values and the information from the example 
into Equation 6.11.4, we obtain the following interval: 


(50)(0.021) 9 (50X0.021) 

-- < CT < - -, 

71.420 32.357 


0.0147 < o- 2 < 0.0325 


To obtain a 95% confidence interval for or, we take the square root of each term 
in the interval for or 2 . This gives 


0.1212 < cr < 0.1803 





This method of constructing confidence intervals for cr 2 is widely used, but it 
is not without its drawbacks. The assumption of the normality of the population 
being sampled is crucial. Results are likely to be misleading if the assumption of 
normality is not met. Another difficulty stems from the fact that this method does 
not yield the shortest possible confidence intervals. We may use the tables given 
by Tate and Klett (1959) to overcome this difficulty. 


Exercises 





6.11.1 A random sample of 10 specimens of a certain material are tested for tensile 
strength. The variance computed from these data is 4. Construct the 95% confidence 
interval for cr 2 . What assumption underlies the construction of this interval? 

6.11.2 A production manager needs to know the time required to complete a certain task 
in a manufacturing plant. A study is designed in such a way that a random sample of 25 
observations is made available for analysis. The variance computed from the sample data 
is 0.3 hour squared, (a) Construct the 95% confidence interval for cr 2 . (b) Construct the 
99% confidence interval, (c) Construct the 90% confidence interval, (d) What assumption 
must we make in order to construct a valid confidence interval? 

6.11.3 An ecologist measures the amount of pollutants in 15 samples of water from a 
stream located in an industrial area. She obtains 2(jc,- - l) 2 = 508.06. Construct a 95% 
confidence interval for the population variance. 


6.12 RATIO OF THE VARIANCES OF TWO NORMALLY 
DISTRIBUTED POPULATIONS 

We often need to compare two population variances. For example, we may wish 
to compare the variances of the tensile strengths of steel wires manufactured by 
two different suppliers. Generally we prefer the product with the smaller variance. 
Suppose that each of the two suppliers of wire provides a product whose tensile 
strength is normally distributed with equal means. Suppose that we cannot use a 
segment of wire with a tensile strength less than k pounds per square inch. If the 
tensile strengths of Supplier A’s wire vary more than those of Supplier B’s wire, 
a larger proportion of Supplier A’s wire will be unusable. Figure 6.12.1 illustrates 
the truth of this statement. We see that the area to the left of k is greater for 
Supplier A’s product (the more variable product) than for Supplier B’s. Since area 
represents proportion of items, we see that the proportion of items less than k is 
greater for A’s product than for B’s. 

One way of comparing two population variances is to form the ratio cr 2 Jcr 2 2 , 
which, if the two variances are equal, is equal to 1. As a general rule, we do not 
know the population variances. We have to base our comparisons on sample 
variances. Since we can infer the magnitude of the ratio ct 2 J(t 2 2 from the sample 
results, the procedure depends on an appropriate sampling distribution. 

We can use the distribution of (s 2 x /erf) / (s\/a\) provided that certain assump¬ 
tions are met. The assumptions are that s 2 and s\ are computed from independent 
samples of size n { and n 2 , respectively, and that the samples are each drawn from 
a normally distributed population. If these assumptions are met, (s 2 /<t 2 )/(s 2 I &V) 
follows a distribution known as the F distribution. See Table G. 



FIGURE 6.12.1 
Two normal 
distributions, 
showing that the 
area to the left of 
k is greater for the 
more variable 
distribution 



We can approximate the sampling distribution of (s 2 Jar'])/{s\/o-\) empirically 
by using the following steps. 

1. Draw a large number of simple random samples of size n x from a normally 
distributed population with known variance <j\ . 

2. Draw the same number of simple random samples of size n 2 ( n 2 may or may 
not be equal to n x ) from a second normally distributed population with known 
variance cr\. 

3. Compute s\ from the data of each sample of step 1. 

4. Compute s 2 from the data of each sample of step 2. 

5. For every possible pair of sample variances s x and s 2 , form the ratio 

6. Construct the frequency distribution of the values computed in step 5. 

The resulting frequency distribution is an approximation of the sampling dis¬ 
tribution of 0cr^)/(s|/o-1), which is distributed as the F distribution. As noted 
in the discussion of the sampling distribution of (n - 1 )s 2 /a 2 , we can use com¬ 
puter simulation in constructing this empirical sampling distribution. 

Chapter 8 discusses the F distribution more completely. To obtain a value of 
F from the table, we specify two degrees of freedom values, one corresponding 
to the value of n x — 1 used in computing s f, the other corresponding to the value 
of n 2 — 1 used in computing s 2 . These quantities are referred to as the numerator 
degrees of freedom and the denominator degrees of freedom, respectively. 

Figure 6.12.2 shows F distributions for various sets of values of degrees of 
freedom. Note that, like chi-square distributions, F distributions are asymmetric. 

To find the 100(1 - a)% confidence interval for , we begin with the 

expression 


a/2 


< 


fj/gj 

sl/c\ 


< F\-a/Z 


( 6 . 12 . 1 ) 





FIGURE 6.12.2 
The F distribution 
for various degrees 
of freedom. (From 
Documents Geigy, 
Scientific Tables , 

7th ed., 1970. 
Courtesy of Ciba- 
Geigy Limited, 
Basle, Switzerland) 



•*? ^ ^ t > 


where F a/2 and F { _ a j 2 are the values from the F table (Table G) to the left and 
right of which, respectively, lie a/2 of the area under the curve. We may rewrite 
the middle term of this expression so that the entire expression is 


^a/l < „2 


cr 


cr 


2<^ 


x/2 


( 6 . 12 . 2 ) 


Dividing through by sys\ gives 

F 


a/2 


cTi 

< -I < 


F i 


nil 


s]hl o-j s 2 Jsl 

If we take the reciprocals of the three terms, we obtain 


F a/1 


> 


2 s\/s\ 


cr 


2> F 


1 -a/2 


(6.12.3) 


(6.12.4) 


Reversing the order of the terms gives the following 100(1 — a)% confidence 
interval for <r\/o\. 


< zi < dM 

F \-a/l <r 2 F a/2 


(6.12.5) 


Table G does not contain entries for a/2. We can get the value of F corre¬ 
sponding to a/2 by taking the reciprocal of the F value corresponding to 1 - 
a/2 when the numerator and denominator degrees of freedom have been inter¬ 
changed. Suppose, for example, that a — 0.05, and the numerator and the de- 



nominator degrees of freedom are 5 and 10, respectively. From Table G we find 
F i _ a / 2 , 5 ,io = ^ 0 . 975 , 5,10 ~ 4.24. When we reverse the numerator and denominator 
degrees of freedom, the tabulated value of F is F x _ a/2 ,\ 0,5 = 6.62. We now 
compute F tt/ 2 i 5 j 0 = 1/6.62 = 0.15. 

We construct the interval by placing the larger of the two sample variances in 
the numerator of the ratio of the sample variances. For example, suppose that we 
want a confidence interval for the ratio of two variances cr 2 and cr\, and that s\ 
is greater than s}. We obtain the interval from 


4M 

F\ - a/2 


< 1 < 


s\/s 


2 

1 


a/: 


EXAMPLE 6.12.1 You are conducting an experiment to compare the life of a certain 
product produced by two different methods. The variance of the lives of a random 
sample of 16 items produced by Method 1 is 1200 hr 2 . A random sample of 21 
items produced by Method 2 gives a variance of 800 hr 2 . A 95% confidence 
interval for o-\/a\ is desired. 

Use Equation 6.12.5 to obtain the following interval: 


1200/800 

2.57 


< oi/oi < 


1200/800 
1/2.76 ’ 


0.58 < cr 2 J(T 2 2 < 4.14 


Give this interval the usual probabilistic and practical interpretations. 


Exercises 





6.12.1 We are comparing two brands of transistors with respect to output. The variance 
of Brand A, based on a random sample of size 21, is 110. For Brand B, n B = 25 and 
Sn = 185. Construct the 99% confidence interval for the ratio of the two population 
variances. What assumptions must we make? 

6.12.2 A random sample of 31 apples is stored under standard conditions. A random 
sample of 25 apples is stored under what are purported to be improved conditions. The 
variable of interest is number of months elapsing before onset of deterioration. The sample 
variance under standard conditions is 4 months squared. Under the new conditions, the 
variance is 1.5 months squared. Construct the 95% confidence interval for the ratio of the 
two population variances. What assumptions must we make? 

6.12.3 Following an exercise program, a physician gives two groups of executives tests 
of their muscular endurance. The scores of group 1, consisting of 16 subjects, yield a 
sample variance of 4685.40. For group 2, consisting of 25 subjects, the sample variance 
is 1193.70. The physician is willing to assume that the two groups of scores constitute 
independent simple random samples from normally distributed populations. Construct a 
95% confidence interval for . 


Summary This chapter introduced statistical inference procedures. The previous chapters 

provided the basic foundation for this and for the material in later chapters. In 
this chapter, the primary inference procedure discussed is estimation. 

We first talked about point estimation and the properties of good estimators . 
You learned that a point estimate consists of a single numerical value computed 



Review Questions 


from a sample. Since you cannot attach a statement of confidence to a point 
estimate alone, it is of limited value. Interval estimates or confidence intervals 
are of much greater use, since you can explicitly state the confidence you have 
that the interval contains the parameter you are estimating. The degree of confi¬ 
dence is equal to the percentage (or proportion) of all similarly constructed inter¬ 
vals that contain the parameter. 

You learned that in many instances the general formula for a two-sided confi¬ 
dence interval is 

Estimate ± (reliability factor) x (standard error) 

In this formula the degree of confidence we need determines the reliability factor. 
We compute the estimate and usually the standard error from sample data. 

This chapter also introduced you to two possible reliability factors, z and t. 
Which reliability factor you should use depends on the specific situation. You 
learned the criteria for choosing between the two. You may consult Table 7.2.1 
and Figures 7.2.2 and 7.2.3 when choosing between z and t. 

We can interpret a confidence interval in two ways. The probabilistic interpre¬ 
tation is stated in terms of what proportion of all similarly constructed intervals 
contain the estimated parameter. The practical interpretation is stated in terms of 
the degree of confidence we attach to the single interval that is computed. 

In this chapter you learned to construct confidence intervals for population 
means, population proportions, population variances, the difference between two 
population means, the difference between two population proportions, the ratio 
of two population variances, and a mean of a population of paired differences. 

You also learned how to compute the sample size you need to obtain a confi¬ 
dence interval for a population mean, a population proportion, the difference 
between two means, and the difference between two proportions. 


1. What is statistical inference? 

2. Why is estimation an important type of inference? 

3. What is a point estimate? 

4. Explain the meaning, as applied to estimators, of: (a) unbiasedness, (b) efficiency, 
(c) sufficiency, (d) consistency. 

5. Define the following: (a) reliability coefficient, (b) confidence coefficient, (c) standard 
error, (d) estimator. 

6. Give the general formula for a confidence interval. 

7. State the probabilistic and practical interpretations of a confidence interval. 

8. Of what use is the central limit theorem in estimation? 

9. Describe the t distribution. 

10. What are the assumptions underlying the use of the t distribution in estimating a single 
population mean? 

11. What is the finite population correction? When can it be ignored? 

12. What assumptions underlie the use of the t distribution to estimate the difference 
between two population means? 








13. What assumption underlies the construction of (a) a confidence interval for a popu¬ 
lation variance, (b) a confidence interval for the ratio of two population variances? 

14. What is the rationale underlying the pooling of sample variances when one is testing 
the difference between means? 

15. The widths of metal bars produced by a certain firm are normally distributed with a 
standard deviation of 0.02 in. A random sample of 25 bars is measured, and a mean of 
2.49 in. is computed. Construct the 95% confidence interval for ju. 

16. A quality-control engineer with a paper manufacturer wishes to estimate the mean 
diameter of a large shipment of logs. From a random sample of 49 logs, the engineer 
computes a mean of 32 in. and a standard deviation of 3.5 in. (a) Construct the 95% 
confidence interval for fi. (b) Construct the 99% confidence interval, (e) Construct the 
90% confidence interval. 

17. Describe a situation from your particular area of interest in which a confidence interval 
for a mean difference would be meaningful. Use real or realistic data to obtain a sample 
of paired observations. Construct a 95% confidence interval for the mean difference. 

18. A manufacturer designs a study to assess the effectiveness of an additive in prolonging 
the shelf life of bath salts. Two independent, simple random samples consisting of 100 
specimens with and 100 specimens without the additive are stored under identical condi¬ 
tions. The average life of the specimens with the additive is 32 months (s 2 = 90). For 
the specimens without the additive, the mean and variance are 24 and 160, respectively. 
The researcher is unwilling to assume that the populations are normally distributed, (a) 
Construct the 95% confidence interval for the difference between the population means, 
(b) Construct the 90% confidence interval, (c) Construct the 99% confidence interval. 

19. A random sample of 200 items produced in a certain factory is inspected for defectives. 
Ten defective items are discovered. Construct the 95% confidence interval for the true 
proportion defective. 

20. A random sample of 100 people in a small town are asked to evaluate two brands of 
coffee. Of the sample, 70 say they prefer Brand A. (a) Construct the 95% confidence 
interval for the true proportion preferring Brand A. (b) Construct the 90% confidence 
interval, (c) Construct the 99% confidence interval. 

21. Two apple orchards are sprayed with two different insecticides to prevent infestation 
of the mature fruit by fruit flies. At harvest time, a random sample of 500 apples from 
each orchard is examined, with the following results: Of the 500 apples from trees sprayed 
with Insecticide A, 50 are infested. Of those from trees sprayed with Insecticide B, 25 are 
infested. Construct the 95% confidence interval for the difference between the true pro¬ 
portions infested. 

22. A firm specializing in direct-mail questionnaires has developed a new format and 
technique that it believes will get a higher response rate than the standard procedure. How 
large a sample of a particular type of respondent should the firm use in order to estimate 
within 0.03 the true proportion who would respond to the new procedure? The desired 
confidence level is 95%. Of those included in a pilot sample, 80% responded. 

23. Each member of a random sample of 36 sixth-grade children keeps a record for one 
week of the amount of time spent watching television. The mean and standard deviation 
computed from the results are 15 hours and 6 hours, respectively. Construct a 99% con¬ 
fidence interval for the population mean. 

24. A random sample of 100 records kept by a utility company on wood utility poles 
placed in service since 1900 reveals a sample mean life and standard deviation of 10.5 
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years and 5 years, respectively. Construct a 95% confidence interval for the population 
mean. 

25. A researcher selects a random sample of 150 wood utility poles currently in service. 
The survey reveals that 15% of the poles need to be replaced. Construct a 99% confidence 
interval for the population proportion in need of replacement. 

26. A random sample of 169 households in a certain area is selected as part of a study of 
the recreation habits of community residents. The respondents indicate a mean amount of 
$350 spent annually for recreation per family. The sample standard deviation is $65. 
Construct a 95% confidence interval for the population mean. 

27. Of the households referred to in question 26, 60% had two or more children. Construct 
the 95% confidence interval for the proportion of households in the area with two or more 
children. 

28. In a survey of adult residents of a rural area, 80 out of 150 respondents say they prefer 
a certain type of music. Construct a 99% confidence interval for the population proportion. 

29. A manufacturer of fire alarm systems makes an alarm that is sensitive to smoke. The 
quality-control department tests a random sample of 15 alarms to determine the level of 
concentration of smoke required for activation. The results, coded for ease of calculation, 
are as follows: 3, 8, 8, 9, 9, 6, 9, 6, 2, 5, 6, 4, 8, 7, 5. Construct a 95% confidence 
interval for the population mean. 

30. A radio station conducts a survey to determine what local citizens perceive to be the 
most pressing national problems. Of 350 adults contacted, 80 say that they feel declining 
moral standards are the most serious problem. Construct the 90% confidence interval for 
the population proportion that holds this opinion. 

31. The editor of a newspaper wishes to know what proportion of the subscribers regularly 
read the business news section. In a random sample of 500 subscribers, 200 state that they 
are regular readers of business news. Construct a 90% confidence interval for the true 
proportion of all subscribers who regularly read the business news. 

32. The mean weight of a sample of 100 trucks weighed at a certain highway weigh station 
is found to be 50,000 lb with a standard deviation of 3600 lb. Construct a 95% confidence 
interval for the population mean. 

33. During a winter shortage of natural gas, citizens are asked to lower the thermostats 
in their homes to 65° or lower. A random sample of 150 households in a certain town 
reveals that 130 have thermostats set at 65° or lower. Construct the 95% confidence interval 
for the proportion of all households in the community that have lowered thermostats. 

34. A survey of 800 regular listeners to a certain radio station reveals that 600 are teen¬ 
agers. Construct the 90% confidence interval for the true proportion of teenagers in the 
audience. 

35. In a random sample of 100 households in a suburb, selected as part of a study of 
energy consumption, the mean amount spent for electricity during December is $42, with 
a standard deviation of $50. Construct a 90% confidence interval for the population mean. 

36. Of 29 applicants for a position as mechanic with a certain firm, 13 have just completed 
a six-month training course at a county vocational school; 16 have learned their trade 
through on-the-job experience. Each applicant is given the same mechanical proficiency 
test. The variance of the scores for the first group is 525. For the second group, the 
variance is 350. Construct a 90% confidence interval for the ratio of the two population 
variances. What assumptions arc necessary? 








37. A researcher with a textile firm randomly divides 21 specimens of fabric into two 
groups. Each group is treated by a different method to make the fabric water repellent. 
The specimens are then tested for ability to repel water, with the following results: n ] — 
11, sf = 280, n 2 = 10, s\ = 200. Assume that the data constitute independent simple 
random samples from normally distributed populations, and construct a 90% confidence 
interval for the ratio of the two population variances. 

38. A research chemist wishes to estimate the mean amount of oxygen (in liters) required 
to bring about a particular chemical reaction when the oxygen is mixed with a fixed amount 
of sulfur. The researcher wants to be within 0.10 liter of the true mean with 95% confi¬ 
dence . Previous studies indicate that the variance of the oxygen requirements for this type 
of chemical reaction is about 0.09. What size sample does this investigator need? 

39. A student working for a doctorate in education wishes to draw a sample of high school 
freshmen in order to estimate the average amount of time per day they spend studying. A 
standard deviation of 20 minutes is reported by a researcher who conducted a similar study. 
The student wants a 95% confidence interval for the population mean. How large should 
the sample be if the estimate should be within three minutes of the true value? 

40. The following data give the means and standard deviations of nonverbal IQ scores 
obtained from independent simple random samples drawn from two populations of factory 
employees: For Sample I, n = 19, x — 110, and 5 = 10. For Sample II, n = 23, x = 
95, and s = 15. Assume that each of the two populations of nonverbal IQ scores is 
approximately normally distributed with equal variances. Construct the 95% confidence 
interval for the difference between the two population means. 

41. New employees hired to perform highly technical tasks are randomly assigned to one 
of two classes for training. Class A uses a computer-assisted instruction technique to teach 
employees the fundamentals of the job. Instruction in Class B follows traditional patterns. 
Performance tests are administered to each employee in the study after six months on the 
job. The results are as follows: For Class A, n — 35,3c = 85, and s = 10. For Class B, 
n = 32, x = 71, and s = 15. Construct a 95% confidence interval for /x A - fi B . 

42. A manufacturer who wishes to increase employee production selects a department 
with 12 employees for an experiment. The manufacturer tries to improve working condi¬ 
tions in this department through renovation and employee incentives. The following table 
shows the mean number of items produced per day by the employees one month before 
and one month after the changes are made. Construct the 95% confidence interval for the 
mean difference. 


Mean number of items produced per day 


Employee 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Before 

75 

61 

62 

68 

58 

70 

59 

79 

68 

80 

64 

75 

After 

82 

70 

74 

80 

65 

80 

70 

88 

77 

90 

75 

87 


43. An economist is studying the attitudes of local citizens toward the national energy 
program. As part of the proposed interview, respondents will be asked to indicate whether 
they agree or disagree with the statement. “The federal government should establish a 
strong energy department.” She wants to know how large a sample to take in order to 
estimate the true proportion of citizens who agree with the statement. She wishes to be 
within 0.025 of the true value with 95% confidence. Researchers who conducted a similar 
study in another locality found that 60% of the people interviewed agreed with the state¬ 
ment. How large should the sample be? 
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44. The personnel director of a large firm wants to estimate the proportion of the 2000 
employees of the firm who plan to go out of the state during vacation. How large a sample 
should the personnel director take if the estimate is to be within 0.05 of the true value 
with 99% confidence? Last year 70% of the employees polled went out of state during 
their vacations. 

45. Researchers with an agrichemical company deliberately infect 200 plants with a certain 
disease. They then treat half the plants with Chemical A and half with Chemical B. Of 
the plants treated with Chemical A, 75 survive. Of those treated with Chemical B, 64 
survive. Construct a 90% confidence interval for p A - p B . 

46. A random sample of 300 blue-collar workers in a certain city reveals that 75% are 
planning to vote for a particular candidate for mayor. Of a random sample of 200 white- 
collar workers, 66% state that they are planning to vote for the candidate. Construct a 
95% confidence interval for the difference between the two population proportions. 

47. As part of an experiment, scientists weigh a random sample of 20 mice. The sample 
yields 2(jq - x) 2 = 72.25. Construct a 95% confidence interval for a 2 . What assumptions 
must you make? 

48. The following are the tensile strengths of 10 specimens of yarn selected at random 
from a day’s production at a textile plant, coded for ease of computation: 66, 37, 18, 31, 
85, 63, 73, 83, 65, 80. Assume that tensile strengths are normally distributed. Construct 
a 99% confidence interval for cr 2 . 




49. A simple random sample of 15 employees who work downtown reported the following 
distances (in miles) traveled to work each day: 13, 21, 35, 10, 24, 35, 19, 11, 25, 17, 
25, 11, 11, 6, 6. Construct a 95% confidence interval for the mean distance traveled by 
the population of employees from which the sample was drawn. 

50. Each of a random sample of 9 automobiles of a certain make was test-driven. The 
number of miles obtained per gallon of gasoline was recorded for each. The results were 
as follows: 23, 25, 21, 22, 23, 22, 21, 24, 22. Construct a 99% confidence interval for 
the population mean. 

51. During a local health fair, a simple random sample of 10 business executives suffering 
from hypertension yielded the following systolic blood pressure readings: 171, 190, 157, 
181, 178, 176, 167, 165, 198, 165. Construct a 90% confidence interval for the population 
mean. 

52. For two populations of employees, a researcher wishes to estimate the difference 
between the proportions that have ever been fired from a job. A confidence coefficient of 
0.99 and an interval width of 0.10 are desired. Estimates of p x and p 2 are 0.12 and 0.28, 
respectively. What equal-sized samples should be drawn? 

53. Researchers have found that it is not profitable for your firm to try to sell a new 
product to persons who live in houses that have fewer than 7 rooms. In fact, the researchers 
have recommended that at least 30% of houses in prospective market areas should have 7 
rooms or more. You want to know whether you should enter a new market area. You ask 
a market research firm to conduct a survey of households, in the area to find out the number 
of rooms per dwelling unit. The results, for a random sample of 60 dwelling units, are as 
follows. 
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Construct a 95% confidence interval for the proportion of dwelling units in the population 
with 7 or more rooms. Should you market your product in the area? 

54. A sample of 25 apples from a truckload shipment yielded the following weights in 
ounces: 16, 11, 14, 20, 16, 13, 14, 17, 10, 18, 11, 20, 14, 15, 19, 16, 10, 15, 16, 14, 
15, 20, 18, 18, 20. 

(a) Compute the mean weight of the apples in the sample. 

(b) Compute the median weight. 

(c) Compute the sample variance and standard deviation. 

(d) Use these data to construct a 95% confidence interval for the population mean. 

(e) Before vouching for the legitimacy of the inference procedure in part (d), is there any 
further information you would like to have? 

(f) The owner of the orchard in which the apples were grown claims that the mean weight 
of the apples in the truck is 16 ounces. Do you think these data support the owner’s claim? 
Explain. 

55. Quality-control experts with a garment manufacturer have found that, on the average, 
an employee performs 30 defective operations in a week when morale is high. When 
morale is low, the number of defective operations is higher. During a recent week in a 
random sample of 12 employees performed the following number of defective operations: 
40, 37, 32, 34, 31, 36, 34, 32, 33, 31, 35, 33. Construct a 99% confidence interval for 
the mean number of defective operations performed by the employees in the sampled 
population. On the basis of these data, does it appear that morale may be low? 

56. A sales manager has found that the salespeople who spend more time per customer 
call are more successful. The most successful salespeople spend, on the average, 50 
minutes per customer call. A random sample of 20 sales calls from the records of a new 
salesperson reveal the following amount of time in minutes spent on sales calls: 35, 41, 
24, 26, 25, 53, 41, 40, 34, 39, 57, 23, 28, 53, 30, 79, 81, 90, 79, 88. Construct a 95% 
confidence interval for the population mean. Does it appear from these data that the new 
salesperson may become one of the firm’s more successful salespeople? 

57. A researcher wants to estimate, for a large population of employees, the proportion 
who have ever sought professional help for an emotional or mental problem. Another 
researcher found the proportion in a similar population to be 0.15. The present researcher, 
who wants to be within 0.03 of the true proportion with 99% confidence, wants to know 
how large a sample to draw. 

58. For two populations of employees, an insurance investigator wants to estimate the 
difference between the mean number of accidents in which the employees have been 
involved during the past 10 years. A confidence coefficient of 0.90 and an interval width 
of 4 are desired. Estimates of the population variances are 28 and 20. How large should 
the sample sizes be (n x = « 2 )? 

59. Select a simple random sample of size 50 from the population of employed heads of 
household given in Appendix II. Construct a 95% confidence interval for the proportion 
of women in the population. Compare your results with those of your classmates and 
determine how many of the intervals constructed by the class include the true proportion, 
which is 0.20. 

60. Select a simple random sample of size 50 from the population of employed heads of 
households given in Appendix II. Then do the following: 

(a) Construct a 95% confidence interval for the population mean commuting distance from 
work, using the finite population correction factor. 
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(b) From the same data, construct a 95% confidence interval for the same population 
mean, but ignore the finite population correction factor. Compare this interval with the 
one you found in step (a). 

(c) Construct a 95% confidence interval for the mean annual salary of the persons in the 
population. 



Effect of Marathon Group Therapy on Drug Abuse Patients 


The responsible manager is concerned with both the physical and mental well¬ 
being of employees. Employees with personal problems—either physical or 
mental—constitute a drain on productivity because of absenteeism, and be¬ 
cause of their inefficiency while on the job. An ability to recognize the symp¬ 
toms of personal problems among employees, plus a knowledge of available 
therapeutic resources, enhances the effectiveness of the manager. 

One personal problem with which the modern manager often has to cope 
is drug abuse among subordinates. Therapists have advanced the idea that 
drug-rehabilitation programs need to focus on changing the behavior and at¬ 
titudes of persons with drug problems. Page and Mannion* conducted a study 
to assess the effects of a 16-hour marathon therapy group, conducted with 
patients in a residential drug-treatment center. Twenty-eight subjects were 
randomly assigned to either a marathon therapy group (n = 12) or a control 
group (n = 16). The control group received no therapy. After the marathon 
therapy, an evaluative test was given to all 28 subjects at the same time. The 
investigators assessed the effects of the marathon group therapy on the atti¬ 
tudes of the participants through the use of the semantic differential technique 
(C. E. Osgood, G. T. Suci, and P. H. Tannenbaum, The Measurement of Meaning, 
Urbana, III., University of Illinois Press, 1957). The mean scores and standard 
deviations on the group-counseling (£) subscale of the semantic differential 
were as follows. 



X 

s 

Marathon group 

5.85 

0.75 

Control group 

5.25 

0.76 


Construct a 95% confidence interval for the difference between population 
means. What assumptions are necessary for this inferential procedure to be 
valid? 


*Richard C. Page and John Mannion, "Marathon Group Therapy with Former Drug Users," Journal of 
Employment Counseling, 17 (1980), 307-313. 



7. Statistical Inference II: 
Hypothesis Testing 


Chapter Objectives: In this chapter we discuss the sec¬ 
ond type of statistical inference, hypothesis testing. You 
will note some similarities as well as differences be¬ 
tween hypothesis testing and the interval estimation 
that you learned about in Chapter 6. The same parame¬ 
ters are of interest. However, in this chapter you will 
analyze sample data to see whether they support or fail 
to support a speculation (hypothesis) about the magni¬ 
tudes of the parameters. In Chapter 6, you were not con¬ 
cerned with preanalysis conjectures about parameters. 
Instead, you used sample data to help you form an opin¬ 
ion about the magnitudes of parameters. 

After you have studied this chapter and worked the 
exercises, you should be able to do the following. 

1. List seven steps that you can follow in testing a hy¬ 
pothesis 

2. Conduct tests of hypotheses about values of the fol¬ 
lowing parameters: (a) A population mean, (b) a pop¬ 
ulation proportion, (c) a population variance, (d) the 
difference between two population means, (e) the dif¬ 
ference between two population proportions, (f) the 
ratio of two population variances, (g) a mean of a 
population of paired differences 

3. Compute a p value for each test 

4. Calculate the power of a test for a specific alternative 
hypothesis about the population mean 

5. Determine the sample size required if a test is to meet 
certain specified conditions 



7.1 INTRODUCTION 


There are two types of statistical inference—estimation, which was covered in 
Chapter 6, and hypothesis testing, which is the subject of this chapter. 

The purpose of hypothesis testing, like that of estimation, is to help one reach 
a decision about a population by examining the data contained in a sample 
from that population. 

In the examples and exercises of this chapter, the samples we refer to are simple 
random samples. In Section 7.2, we cover some general concepts of hypothesis 
testing. In succeeding sections, we shall cover specific tests of hypotheses in 
detail. 

Again, the sampled population may not always be the same as the target pop¬ 
ulation . When using hypothesis testing, you should exercise the same caution in 
distinguishing between these two kinds of population that we suggested in con¬ 
nection with interval estimation. 


7.2 HYPOTHESIS TESTING—SOME GENERAL CONSIDERATIONS 

We may define a hypothesis simply as a statement about one or more populations. 
The hypotheses of interest here are those concerned with one or more parameters 
of the population or populations about which we are making the statement. An 
advertising executive may hypothesize that a certain type of newspaper ad attracts 
a larger proportion of readers than some other type of ad. A production supervisor 
may hypothesize that employees trained in a certain way need less time to do a 
task than employees trained in some other way. A quality-control engineer may 
hypothesize that the variance of the measurements generated by some process is 
equal to some specific value o%. A marketing analyst may hypothesize that the 
mean family income in a certain area is some specific value /i 0 . Or a company 
president may hypothesize that 60% of the company’s employees have completed 
at least one year of college. 

Given enough time, money, and other resources, each of these investigators 
could determine beyond doubt the truth of the hypothesis by examining the entire 
population to which the statement refers. But such an undertaking would cost a 
great deal. So investigators welcome a more economical means of testing the 
reasonableness of their hypotheses. And they are willing to settle for some degree 
of uncertainty in their conclusions. 

The cases just described are typical of situations in which the concepts and 
techniques of sampling work well. The motivation for sampling may be a need 
to obtain estimates of population parameters, as discussed in Chapter 6, or to test 
hypotheses, as we shall see in this chapter. The advantages of sampling mentioned 
in Chapter 6 also apply in hypothesis testing. Therefore we shall not repeat them 
here. Here is a seven-step procedure for hypothesis testing. 



1. Statement of the hypotheses 

2. Identification of the test statistic and its distribution 

3. Specification of the significance level 

4. Statement of the decision rule 

5. Collection of the data and performance of the calculations 
6* Making the statistical decision 

7. Making the administrative decision 

There is nothing sacred about this format. It just breaks the hypothesis-testing 
process into its basic components of acts and decisions. We can then analyze and 
understand each separately. 

1. Statement of the hypotheses. You will ordinarily be concerned with two sta¬ 
tistical hypotheses, the null hypothesis (designated H 0 ) and the alternative hy¬ 
pothesis (designated H x ). 

The null hypothesis is the hypothesis that is tested. 

The null hypothesis usually specifies one of the parameters of the population of 
interest. For example, the statement, or hypothesis, that 60% of the employees 
in a firm have had at least one year of college specifies that the parameter, the 
proportion of employees with at least one year of college, is 0.60. The null 
hypothesis is the hypothesis assumed to be true throughout the statistical analysis. 
The analysis is based on this assumption. Only after the analysis is complete, and 
there is evidence to warrant our doing so, do we entertain the idea that the null 
hypothesis is not true. 

The term null hypothesis reflects the concept that this is a hypothesis of no 
difference. For this reason, the null hypothesis always includes a statement of 
equality. When it is presented symbolically, it contains an equals sign. 

The alternative hypothesis is the alternative available when the null hypothesis 
has to be rejected. 

In the case of the company president’s hypothesis about the education of the 
employees, we may state the alternative hypothesis in one of three ways: (1) the 
true proportion is not 0.60; (2) the true proportion is greater than 0.60; or (3) the 
true proportion is less than 0.60. In the case of the alternative, the statement of 
the hypothesis implies either a condition of not equal or an inequality. In the first 
case, if we reject the null hypothesis, we conclude that the true condition of the 
population with regard to the parameter is something other than that specified in 
the null hypothesis. Flowever, this alternative does not indicate whether the true 
proportion is greater or less than that specified in the null hypothesis. In the second 
and third cases, when we reject the null hypothesis, we conclude that the true 
proportion is as specified by the alternative hypothesis. Which of the three alter¬ 
native hypotheses we use is dictated by the nature of the problem. 

An investigator may originally formulate a hypothesis in the null form or as 
the alternative. Thus the proportion of employees who have completed at least 




one year of college can be stated as 0.6 (null form) or not 0.6 (alternative form). 
Regardless of how we state the original hypothesis, we must specify both appro¬ 
priate null and alternative hypotheses before we collect any data. In general, if 
we hypothesize that a population parameter 8 is equal to some value 0 o , we may 
display the null and alternative hypotheses formally as follows: 

H 0 : 8 = 8 0 , H { : 0 # 8 0 

When setting up the null and alternative hypotheses, you must determine what 
you are trying to conclude. You should state this in the alternative hypothesis, 
unless this would violate the rule that the null hypothesis includes a statement of 
equality. You should state what you are trying to conclude in the alternative 
hypothesis because we want to reject the null hypothesis if at all possible. If we 
reject the null hypothesis, we can conclude with a high degree of conviction that 
the alternative hypothesis is true. If we cannot reject the null hypothesis, however, 
we do not conclude that the null hypothesis is true. We merely conclude that it 
may be true. This is because, in general, evidence compatible with a hypothesis 
is never conclusive, whereas contradictory evidence is sufficient to cast doubt on 
a hypothesis. 

Consider an example. A firm that makes a headache remedy claims that the 
product always cures headaches within 15 minutes. This is the firm’s hypothesis. 
You develop a headache and take the remedy. Your headache is gone within 15 
minutes. You would not conclude that the firm’s hypothesis is true. If you had 
similar results with your next 25 headaches, would you conclude that the hypoth¬ 
esis is true? Although the evidence in favor of the hypothesis is now substantial, 
it is not sufficient for concluding that the manufacturers'’s hypothesis is true. On 
your twenty-seventh headache, relief does not come for 30 minutes. You can then 
conclude that the hypothesis is not true. This conclusion is based on a rejected 
hypothesis. You needed only one piece of contradictory evidence to reach this 
decision. And 26 pieces of evidence in favor of the hypothesis failed to establish 
its truth. 

Decisions about statistical hypotheses of the type we consider here are never 
as clear-cut as in the headache example. The concept, however, is the same. To 
summarize, a decision based on a rejected null hypothesis is more conclusive than 
a decision based on evidence that is compatible with a null hypothesis. You’ll 
realize the truth of this statement as you gain additional insight into the general 
nature of hypothesis testing. [For a further discussion of hypotheses, see the two 
articles by Wilson et al. (1964, 1967).] 

2. Identification of the test statistic and its distribution . A test statistic is one that 
is used in statistical hypothesis testing. Generally the test statistic may assume 
many possible values. The particular value observed depends on the particular 
sample drawn. The test statistic serves as a decision-maker, since the decision to 
reject or not reject the null hypothesis depends on its magnitude. One test statistic 
is based on 


with which you are already familiar. Chapter 5 showed that this quantity follows 
the standard normal distribution when certain assumptions are met. Some other 
test statistics with which we shall be concerned are based on 


t = 




i/Vn 


and 


t = 


(*i 


x 2 ) - (p l - fX 2 ) 



both of which follow a t distribution, and 

P ~ P 


Ipo - p ) 


/ 
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which is approximately normally distributed when certain conditions are met. 
Many of the test statistics that we encounter will be of this form: 

^ . . sample statistic -• value of hypothesized parameter 

Test statistic = ---—-*--- 

standard error of the statistic 

3. Specification of the significance level. When the results are in, there are two 
possible actions: (1) reject H 0 or (2) fail to reject H 0 . A hypothesis that is not 
rejected may be true or false. Likewise, a rejected hypothesis may be either tme 
or false. Thus, there ar e four possible outcomes when we test a hypothesis: (1) 
rejecting a false null hypothesis, (2) rejecting a true null hypothesis, (3) failing 
to reject a false null hypothesis, and (4) failing to reject a true null hypothesis. 
Outcomes (2) and (3) are undesirable. Outcomes (1) and (4) are desirable. We 
may classify the possible outcomes by the action taken and by the condition of 
the population relative to the null hypothesis. In tabular form, this two-way clas¬ 
sification is: 

Possible condition of null hypothesis 


Possible action 



True 

False 

Fail to reject H 0 

Correct 

Incorrect 

Reject H 0 

Incorrect 

Correct 


We may think of the two undesirable outcomes as erroneous actions, or errors, 
and distinguish them by referring to them by type. That is, we may call the act 
of rejecting H 0 when it is true a Type I error, and the act of failing to reject H 0 
when it is false, a Type II error. 

Bear in mind that in a hypothesis-testing situation there is always the probability 
that you will commit one or the other of these errors. We call the probability of 
committing a Type I error a, and the probability of committing a Type II error 
f3. The larger a, the more likely it is that we will commit a Type I error. That 
is, the more likely it is that we will reject a true null hypothesis. The larger /;!, 
the more likely it is that we will commit a Type II error and fail to reject a false 



null hypothesis. We would like the probability of committing both errors to be as 
small as possible. As the examples will show, for a given sample size, decreasing 
a causes an increase in ft. Conversely, decreasing fi causes an increase in a. The 
only way to reduce the likelihood of both types of error is to increase the sample 
size. 

As we have noted, a is the probability of committing a Type I error. The 
quantity a is also called the level of significance . Before we collect the data, we 
specify that the level of significance, or probability of committing a Type I error, 
be some small probability. When we have computed the test statistic, we determine 
the probability of obtaining a value as extreme as or more extreme than ours when 
the null hypothesis is true. If this probability is less than or equal to a, we reject 
H 0 in favor of H x . We then say that the computed value of the test statistic is 
significant. If the probability associated with the computed test statistic is greater 
than a , we cannot reject the null hypothesis. The value of the test statistic is then 
not significant. Although we could use any value of a between 0 and 1, the most 
common values are 0.05 and 0.01. These choices of a , though somewhat arbitrary, 
are based on tradition. [For a further consideration of the choice of significance 
level, see Labovitz (1968), Meyers and Melcher (1969), Skipper et al. (1968), 
Willis (1972).] 

We choose the value of a in reference to the consequences of a Type I error. 
Consider the possible consequences of committing the two errors in an actual 
situation. Suppose that a firm that makes calculators has a policy of refusing to 
accept any shipment of microcircuits if there is reason to believe that more than 
7% of them are of inferior quality. The manager selects a sample of microcircuits 
from each shipment and uses an appropriate hypothesis-testing procedure. Some¬ 
times the hypothesis test indicates that a shipment should be rejected. By rejecting 
a shipment, the manager may commit a Type I error. Assume that when a shipment 
is rejected the only alternative is to buy microcircuits from another supplier at a 
higher price. If the manager rejects a shipment of cheaper microcircuits that in 
fact do meet the specifications, and buys more expensive ones, this increases the 
cost of the calculators. A cost increase, then, is the consequence of rejecting a 
true null hypothesis in this case. If, on the other hand, the firm accepts a shipment 
as satisfactory, it may be committing a Type II error. That is, it may be accepting 
a shipment of inferior microcircuits, thus increasing the chance of producing 
inferior calculators. This may result in an added cost later if the firm has to make 
good its warranties. If it places too many inferior calculators on the market, the 
company may also face consumer ill will. 

The manager must decide which of the two errors is more costly, then try to 
either minimize the probability of the more expensive error, or strike a balance 
between the two errors on the basis of the costs involved. 

Note that we select a early in the investigation. If we select a after we have 
completed the test, the results may influence our choice, and this would detract 
from the objectivity of the investigation. 

The distribution of a test statistic includes all values that the statistic can assume 
when H 0 is true. In other words, we can imagine the set of all values of the test 
statistic that are possible when the null hypothesis is true. We call the subset of 



values of the statistic that are unlikely if the null hypothesis is true the rejection 
region. We call the remaining values the acceptance region . (However, avoid 
such phrases as “accept the null hypothesis.” The word “accept” implies a 
greater degree of conviction than we should accord decisions based on hypotheses 
that we cannot reject. When we cannot reject the null hypothesis, we should 
characterize our action by saying that we “fail to reject the null hypothesis.”) 
We call the values of the test statistic that separate the acceptance region from 
the rejection region critical values . The value of a determines the delineation of 
rejection and acceptance regions, in conjunction with the value of the hypothesized 
parameter and the relevant sampling distribution. This will become clearer when 
we consider a specific example. 

4. Statement of the decision rule. We may state the decision rule, which is made 
before the data are gathered, in probabilistic terms, as follows: 

If, when the null hypothesis is true, the probability of obtaining a value of the 
test statistic as extreme as or more extreme than the one actually obtained is 
less than or equal to a, we reject the null hypothesis. Otherwise, we do not 
reject the null hypothesis. 

We may express this rule in terms of the computed test statistic, as follows: 

If the computed value of the test statistic falls in the rejection region, we reject 
the null hypothesis. If the computed value of the test statistic falls in the ac¬ 
ceptance region, we do not reject the null hypothesis. If the computed value 
of the test statistic is equal to the critical value, we reject the null hypothesis. 

Regardless of how we state the decision rule, it will, if followed, lead to the 
same decision. [For a good discussion of Type I and Type II errors, see Feinberg 
(1971).1 

5. Collection of the data and performance of the calculations. Obtain the data to 
be analyzed as part of the decision-making process according to sound scientific 
principles. We cannot stress this point too much. The quality of a final decision 
depends on the quality of the raw data on which it is based. We shall discuss 
ways to improve the quality of basic data by using proper planning techniques in 
more detail in Chapter 8, on analysis of variance, and Chapter 14, on design of 
sample surveys. If the inferential procedures discussed here and in previous chap¬ 
ters are to be valid, the sample must be random. 

Collect the data with the analysis in mind. Plan the analysis in detail before 
collecting the data in accordance with the steps outlined in Chapter 1. The nature 
of the calculations depends on the question being answered or the problem being 
solved. The method of analysis depends on the complexity of the calculations and 
the amount of data to be processed. For simpler problems, a desk calculator may 
be adequate. But for more complicated surveys, involving a large amount of data, 
you may need a computer. 

6. Making the statistical decision. Evaluate the computed test statistic in light of 
the decision rule. The statistical decision consists of rejecting or not rejecting the 
null hypothesis based on this evaluation. 





7. Making the administrative decision. The nature of the administrative decision 
depends on the statistical decision. If we reject the null hypothesis, for example, 
the administrative decision will as a rule be compatible with the alternative hy¬ 
pothesis. The administrative decision may also take other forms, such as a decision 
to gather more data. 

Figure 7.2.1 is a flowchart of the steps in testing a hypothesis. 

When testing hypotheses about population means and proportions, you must 
choose either z or t as the appropriate test statistic. The criteria for choosing 
between the two (as reliability factors) when constructing confidence intervals also 
apply in hypothesis-testing situations. Table 7.2.1 and Figures 7.2.2 and 7.2.3 
summarize these criteria for cases in which the mean and the difference between 
two means are the parameters of interest. 

For further discussion of the areas that are important in formulating hypotheses, 
see Lurie (1958). The following sections are devoted to some specific hypothesis 
tests. The examples and exercises of this chapter assume that the population is 
large enough relative to the sample that the finite population correction can be 
ignored. 


FIGURE 7.2.1 
Flowchart for 
hypothesis-testing 
procedure 



TABLE 7.2.1 
Guide to 
construction of 
confidence 
intervals and 
testing of 
hypotheses 


Sample 
Case statistic 

Normal 
popula- 
Parameter tion(s)? 

Large 

sample 

size(s)? 

Variance(s) 

known? 

Standard error 

Test 

statistic 

1 

2 

3 

4 

5 

6 

7 
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x or d 
x or c? 
x or c? 
x or d 
x or d 
x or d 
x or d 
x or d 

M or p d 

M or p d 

M or p d 

M or p d 
p or p d 

p or p d 

p or p d 
P or p d 
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Yes 

Yes 

Yes 

No 

No 

No 

No 

Yes 

Yes 

No 

No 

Yes 

Yes 

No 

No 
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No 

Yes 

No 

Yes 

No 

Yes 

No 
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Use nonparametric test (Chapter 12) 
Use nonparametric test (Chapter 12) 
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Pi - M2 

Yes 

Yes 

Yes (= or 

°xi -X 2 =V(^/pi) + (o-l/n 2 ) 

z 

10 

Xi - x 2 

Pi “ P-2 

Yes 

Yes 

No (-) 

s xi-x 2 = V(sf/Pi) + (sl/n 2 ) 
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z or t 
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t' 

15 
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No 
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16 
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No 
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This table is based on a tabie designed by Professor Glenn Milligan, Ohio State University. 
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FIGURE 7.2.2 
Flowchart for 
deciding between z 
and f when making 
inferences about 
population means 
[* = use a 
nonparametric 
procedure 
(Chapter 12)] 


7.3 THE MEAN OF A NORMALLY DISTRIBUTED POPULATION- 
KNOWN POPULATION VARIANCE 

This section considers examples of hypothesis testing that require the drawing of 
only one sample. We are interested in knowing whether or not the sample drawn 
is likely to have come from a population that has a specified mean. 

EXAMPLE 7.3.1 A mail-order company that deals in small gifts charges a flat rate 
for postage, regardless of the weight of the package. This policy is based on the 
results of a study conducted several years ago. The study revealed that the mean 
weight of mailed packages was 17.5 ounces with a standard deviation of 3.6 
ounces. The total flat postage rate is the current postage rate per ounce times 17.5. 
The company management assumed that in the long run the firm would break 
even on postage costs. The accounting department feels that the mean weight of 
packages being mailed today may not be 17.5 ounces and that the flat rate charged 
perhaps should be changed. It suggests that this hypothesis be tested. The volume 
of business has grown so large that doing a complete study, as was previously 
done, would be impractical. Therefore it is decided to take a random sample of 
the weights of 100 packages mailed and base the decision on the results from the 
sample. It is assumed that the weights of packages are approximately normally 
distributed. 

We can reach a decision by following the seven steps of hypothesis testing. 

1. Statement of the hypotheses. We wish to reject or not reject the hypothesis that 
the mean weight of packages being mailed today is the same as it was previously. 





FIGURE 7.2.3 
Flowchart for 
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and t when making 
inferences about 
the difference 
between two 
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The implied alternative hypothesis makes no suggestion as to the direction of any 
change. It merely suggests that the mean is now different from 17.5. We may 
state the null and alternative hypotheses symbolically as follows: 

H 0 : p = 17.5, H x : p ¥> 17.5 

We state the two hypotheses in this manner because the firm presumably wants 
to reject the null hypothesis if the mean has either increased or decreased. That 
is, the firm wants to adjust its postage charges up or down, depending on the true 
state of affairs. 

2. Identification of the test statistic and its distribution. Since the parameter of 
interest is the population mean fi , the relevant statistic to be computed from the 
sample is x. We know from Chapter 5 that when the sampled population is nor¬ 
mally distributed, the sampling distribution of x is normal, with mean fi and 
variance a 2 /n. The test statistic that we can compute from the sample data, 
therefore, is 



which has the standard normal distribution. 

3. Specification of the significance level. Assume that the consequences of com¬ 
mitting a Type I error, rejecting H 0 when it is true, are such that we are willing 
to take a 1 in 20, or a 5 in 100, chance of committing this type of error. This 
decision sets the level of significance at a = 0.05. Reference to the null and 
alternative hypotheses reveals that both extremely large and extremely small values 
of the test statistic will cause rejection of the null hypothesis. This is because the 
hypothesized sampling distribution of x is centered on 17.5, the hypothesized 
value of p. Values of x “far” from 17.5 in either direction, above or below, 
cause z to fall in the rejection region. In other words, if the sample yields an 
extremely large value of x, we shall compute an extremely large value of z. And 
if we obtain an extremely small value of x from the sample data, we shall compute 
an extremely small value of z. We know that in the standard normal distribution, 
extremely large values of z are located in the right tail and extremely small ones 
in the left tail. Half of a, therefore, is assigned to each tail of the distribution. A 
hypothesis test of this type is called a two-sided test. 

The specification of a fixes the line of demarcation between the acceptance 
region and the rejection region. In other words, the values of z that have a/2 = 
0.05/2 = 0.025 of the area under the standard normal curve to their left and 
right, respectively, are —1.96 and +1.96. The rejection region consists of z 
values greater than or equal to +1.96 and smaller than or equal to — 1.96. The 
acceptance region consists of the remaining values of z. Figure 7.3.1 shows the 
acceptance and rejection regions for a = 0.05. 

4. Statement of the decision rule. In the present example there is a two-sided test. 
Thus a = 0.05 is divided equally between the two tails of the distribution of the 



FIGURE 7.3.1 
Standard normal 
distribution, 
showing 
acceptance and 
rejection regions 
for ol = 0.05 for a 
two-sided test 
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test statistic. We must reflect this in the statement of the decision rule. In prob¬ 
abilistic terms, we may state the decision rule in two parts. 

(a) If the data yield a value of the test statistic so large that the probability of the 
occurrence of a value this large or larger when H 0 is true is less than or equal to 
a/2 = 0.025, we reject the null hypothesis. 

(b) If the data yield a value of the test statistic so small that the probability of 
the occurrence of a value this small or smaller when H 0 is true is less than or 
equal to a/2 = 0.025, we reject the null hypothesis. 

As we have already noted, the critical values of the test statistic are ± 1.96. 
We may state the decision rule in terms of these values as follows: If the computed 
value of the test statistic is either greater than or equal to +1.96 or less than or 
equal to - 1.96, we reject the null hypothesis. 

5. Collection of the data and performance of the calculations. The next step is 
to collect the data and perform the calculations. We have already decided to draw 
a sample of 100 packages to be weighed. The statistic of interest is the arithmetic 
mean x. Suppose that the value of x computed from the sample is 18.4 ounces. 
From the sample, we may compute the following value of the test statistic. 

= 18.4 - 17.5 
Z ~~ 3.6/VIOO 

6. Making the statistical decision. From the table of the standard normal distri¬ 
bution, the probability of obtaining a value of the test statistic this large or larger 
when the null hypothesis is true is less than 0.025. In fact, the probability of 
obtaining a value of 2.5 or larger is 0.0062. According to the decision rule, then, 
we reject the null hypothesis. Alternatively, we can say that we reject the null 
hypothesis because 2.5 is greater than 1.96. 

7. Making the administrative decision. The administrative decision compatible 
with the results of the study is that the mean weight of mailed packages has 
changed. The firm should consider changing the amount charged for postage. 



Relationship 
Between 

Hypothesis Testing 
and interval 
Estimation 

for some level of significance a. Instead of following the procedure just discussed, 
we can test this hypothesis by constructing the 100(1 - a)% confidence interval 
for fx. If fi 0 is contained in this interval, we fail to reject H 0 . If, on the other 
hand, is not contained in the interval, we reject H 0 . 

We can illustrate this by using the data of Example 7.3.1. The 95% confidence 
interval for fi in this example is 

3.6 

Vioo 

Since the interval does not contain p 0 = 17.5, we reject H 0 . This is the same 
result we obtained by following the seven-step hypothesis-testing procedure. 

The test illustrated by Example 7.3.1 is a two-sided test. Now here is an 
example of a hypothesis test when a one-sided test is appropriate. 

EXAMPLE 7.3.2 The quality-control department of a food-processing firm specifies 
that the mean net weight per package of cereal should not be less than 20 ounces. 
Experience has shown that the weights are approximately normally distributed 
with a standard deviation of 1.5 ounces. A random sample of 15 packages yields 
a mean weight of 19.5 ounces. Is this sufficient evidence to indicate that the true 
mean weight of the packages has decreased? 

We shall use a hypothesis test to help us answer this question. 

1. Statement of the hypotheses . We can say that the sample data provide sufficient 
evidence that the mean has decreased if we can reject the null hypothesis that the 
mean has either remained the same or increased. This reasoning suggests the 
following hypotheses: 

H 0 : ii > 20, H 0 : fi < 20 

Note one way in which these hypotheses differ from the hypotheses for the two- 
sided test: In the two-sided test, the null hypothesis specifies only one value of 
the parameter, whereas the null hypothesis of the one-sided test specifies a large 
number of values. Theoretically, then, in the case of a one-sided test, a large 
number of tests would be needed. Generally we perform only one test—a test at 
the point of equality. It can be shown, however, that if we reject H 0 at the point 
of equality, we will also reject H 0 for any other value implied by the null hy¬ 
pothesis. 

2. Identification of the test statistic and its distribution. Since the population is 
approximately normally distributed and we know the population standard devia¬ 
tion, we can compute the following test statistic: 


1.96 


\fn 


18.4 


.96 


18.4 ±0.7. 


17.7, 19.1 


At this point let us consider the relationship between hypothesis testing and interval 
estimation. Specifically, we can use the interval estimation procedures discussed 
in Chapter 6 to test hypotheses. For example, suppose that we wish to test 

H q : fi = fi Q against the alternative Hp. fii£ 
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p Values 


* - Mo 
cr/Vn 


3. Specification of the significance level. Assume that a 0.05 level of significance 
is satisfactory. 


4. Statement of the decision rule. The fact that there is an inequality in the 
alternative hypothesis indicates that this is a one-sided test. All of a = 0.05, 
therefore, will be in one tail of the distribution of the test statistic. Since computed 
values of the test statistic that are relatively small will cause rejection of the null 
hypothesis, the region of rejection will be in the left tail. The critical value of z, 
then, is that value of z to the left of which lies 0.05 of the area under the standard 
normal curve. Appendix Table C shows the critical value of the test statistic to 
be - 1.645. We may state the decision rule as follows: If the value of z computed 
from the sample data is less than or equal to - 1.645, we reject H 0 . Otherwise 
we do not reject H 0 . Figure 7.3.2 shows the acceptance and rejection regions for 
this example. 


5. Collection of the data and performance of the calculations. A random sample 
of size 15 yielded a mean of 19.5. From these data we compute the following 
value of the test statistic: 


19.5 - 20 
1.5/V15 


-1.29 


6. Making the statistical decision. Since -1.29 is greater than -1.645, we 
cannot reject the null hypothesis. 

1. Making the administrative decision. Even though the sample mean is less than 
20, the test result does not provide sufficient evidence to indicate that the true 
mean has decreased. 


In scientific journals, researchers usually report, as part of their research findings, 
a quantity known as the p value. A p value is a probability associated with a 
statistical hypothesis test. 


FIGURE 7.3.2 
Standard normal 
distribution, 
showing 
acceptance and 
rejection regions 
for a = 0.05 for a 
one-sided test 




A p value is the probability of obtaining a value of the test statistic as extreme 
as or more extreme (in the appropriate direction) than that actually obtained, 
given that the tested null hypothesis is true. It is also the smallest level of 
significance at which H 0 can be rejected. 

When you read an article in The Journal of Marketing Research, for example, 
you are likely to see such compact statements as p < 0.01, 0.025 < p < 0.05, 
and so on. The statement p < 0.01, for example, tells you that if the null hy¬ 
pothesis is true, the probability of obtaining a value of the test statistic (such as 
z) as extreme as or more extreme than that actually observed is less than 0.01. 
We interpret such a finding as evidence supporting the rejection of the null hy¬ 
pothesis and the acceptance of the alternative hypothesis. Hodges and Lehmann 
(1970) refer to the p value as “a measure of the degree of surprise which the 
experiment should cause a believer of the null hypothesis.” The smaller the p 
value, the greater the surprise of the believer of the null hypothesis. 

A variety of symbols are used for the p value. We shall use the lower-case 
letter p, since this seems to be the most common. Do not confuse this p with the 
p used as the symbol for a population proportion. The context in which it appears 
will always make it clear whether p refers to the p value or to a population 
proportion. 

Calculating a p Value The p value associated with a given hypothesis test depends 
on three conditions: (1) the test statistic used, (2) the magnitude of the computed 
value of the test statistic, and (3) whether the alternative hypothesis is one-sided 
or two-sided. We find p values in a table of the applicable test statistic. As an 
example, let us refer to Example 7.3.2, in which we had a one-sided test and we 
computed the value of the test statistic as z = - 1.29. From Table C, the prob¬ 
ability of obtaining a value of z as small as or smaller than - 1.29, if the null 
hypothesis is true, is equal to 0.0985. Since 0.0985 is greater than our chosen 
level of significance, 0.05, we would not reject H 0 . Figure 7.3.3 shows the p 
value for Example 7.3.2. 

When the alternative hypothesis is two-sided and the distribution of the test 
statistic is symmetric, as in the case of the z distribution, we double the p value 
that would apply if the alternative hypothesis were one-sided. A two-sided alter¬ 
native hypothesis, you will recall, allows for a difference from the null hypothesis 



FIGURE 7.3.3 
p value for 
Example 7.3.2 



in either direction. That is, either a sufficiently large or a sufficiently small value 
of the test statistic causes rejection of the null hypothesis. Doubling the one-sided 
p value reflects this characteristic of a two-sided hypothesis test. 

Let us refer to Example 7.3.1, in which the test was two-sided and the computed 
value of the test statistic was 2.5. The sample data resulted in a test statistic that 
was located in the right tail of the distribution. However, we did not know that 
this would happen when we set up the test. To allow for the possibility of the test 
statistic’s falling in either tail, we made the test two-sided. Just as a is divided 
between the two tails, the p value must also come from both tails. In this case 
the p value is equal to the area under the z curve to the right of +2.5 plus the 
area to the left of -2.5. When we consult Table C, we find that p = 0.0062 + 
0.0062 = 0.0124. Since 0.0124 is less than 0.05, we reject H 0 . Figure 7.3.4 
shows the p value for Example 7.3.1. 

Advantage of Reporting p Values When researchers report a p value as part of 
their research findings, we can set our own level of significance. We can then use 
our own criterion to reject or not reject the null hypothesis, rather than that of the 
researcher. The researcher who reports merely that the null hypothesis was rejected 
at, say, the 0.05 level is withholding information. This deprives the reader of the 
ability to make an independent decision on whether or not to reject the null 
hypothesis. [For more detailed discussions of p values, see the articles by Bahn 
(1972), Daniel (1977), and Gibbons and Pratt (1975).] 


Exercises 




Carry out the seven-step hypothesis-testing procedure at the indicated level of significance 
and compute the p value for each test. 

7.3.1 Suppose that a population is normally distributed with a standard deviation of 50. 
A random sample of size 25 is drawn from the population and a sample mean of 70 is 
computed. Test at the 0.01 level of significance the null hypothesis that p = 100. 

7.3.2 A manufacturer of bolts claims that the mean length is 4.500 in. with a standard 
deviation of 0.020 in. A random sample of 16 bolts yields a mean of 4.512 in. Do these 
data provide sufficient evidence to indicate that the true mean length is greater than the 
manufacturer claims? Assume that the dimensions are normally distributed. Let a = 0.01. 

7.3.3 A manufacturer of chemicals produces a certain compound by adding distilled water 
to fixed amounts of other ingredients. The amount of water needed depends on the purity 


FIGURE 7.3.4 
p value for 
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of the other ingredients. As a result of using quality-control techniques, the manufacturer 
has determined that the mean amount of water needed to meet product standards is 6 liters 
with a standard deviation of 1 liter. A random sample of 9 batches required, on average, 
7 liters of water. Do these data provide sufficient evidence to indicate that quality-control 
standards are not being met? Let a — 0.05. 

7.3.4 A psychologist is conducting a research project in which the subjects are employees 
with a certain type of physical handicap. On the basis of past experience, the psychologist 
believes that the mean sociability score of the population of employees with this handicap 
is greater than 80. The population of scores is known to be approximately normally dis¬ 
tributed, with a standard deviation of 10. A random sample of 20 employees selected from 
the population yields the following results: 99, 69, 91, 97, 70, 99, 72, 74, 74, 76, 96, 
97, 68, 71, 99, 78, 76, 78, 83, 66. The psychologist wants to know whether this sample 
result provides sufficient evidence to indicate that this belief about the population mean 
sociability score is correct. Let a = 0.05. 


7.4 THE MEAN OF A NORMALLY DISTRIBUTED POPULATION- 
UNKNOWN POPULATION VARIANCE 

We often need to test hypotheses about population means when we do not know 
the population variance. In such cases, even though the population may be ap¬ 
proximately normally distributed, we cannot compute the test statistic 

z = * " 
cr/Vn 

because we do not know cr. When this is the case, we use the test statistic 



As we have seen, this is distributed as Student’s t with n - 1 degrees of freedom. 

In all respects other than choice of test statistic, the hypothesis-testing procedure 
appropriate under these conditions is the same as that outlined in Section 7.3. 

EXAMPLE 7.4.1 A tire manufacturer claims that the average life of a certain grade 
of tire is greater than 25,000 miles under normal driving conditions on a car of a 
certain weight. A random sample of 15 tires is tested. A mean and standard 
deviation of 27,000 and 5000 miles, respectively, are computed. Assume that the 
lives of the tires in miles are approximately normally distributed. Can we conclude 
from these data that the manufacturer’s product is as good as claimed? 

1. Statement of the hypotheses. We are asking whether we can conclude that jx, 
the true mean, is greater than 25,000. Thus a statement to this effect should go 
in the alternative hypothesis. The appropriate hypotheses, then, are 

H 0 : fx < 25,000, H x : fx > 25,000 



Calculating the p 
Value 


2. Identification of the test statistic and its distribution. The population is ap¬ 
proximately normally distributed and the population standard deviation is un¬ 
known. Therefore the appropriate test statistic is 

t = £ ~ Mo 
s/Vn 

This is distributed as Student’s t with n - 1 degrees of freedom. 

3. Specification of the significance level. Assume that a significance level of 0.05 
is satisfactory. 

4. Statement of the decision rule. The test is a one-sided test. Since only relatively 
large values of t will cause us to reject H 0 , the region of rejection is in the upper 
tail of the distribution. The critical value of the test statistic, then, is the value of 
t with n — 1 = 14 degrees of freedom that has to its right 0.05 of the area under 
the curve of t. From Appendix Table E, we find this value to be 1.7613. The 
decision rule, then, may be stated as follows: If the computed value of t is greater 
than or equal to 1.7613, we reject H 0 . 

5. Collection of the data and performance of the calculations. From the infor¬ 
mation given in the statement of the problem, we compute the following value of 
the test statistic: 


= 27,000 - 25,000 = 

' ~ 5000/VT5 

6. Making the statistical decision. Since 1.55 < 1.7613, we cannot reject H 0 . 

7. Making the administrative decision. Since we do not reject the null hypothesis, 
the data do not support the conclusion that the true mean life of the tires is greater 
than the manufacturer claims. Any action the tire firm takes that is incompatible 
with the hypothesis that p, < 25,000 would not be warranted on the basis of these 
data. 

Readily available tables of the t distribution do not provide us with enough detail 
to determine the exact p value associated with the computed value of the test 
statistic. The table of Student’s t distribution in Table E, for example, gives values 
of t only for selected percentiles: 0.90, 0.95, 0.975, 0.99, and 0.995. Unless the 
computed value of t happens to be exactly equal to a tabulated value, we cannot 
determine an exact p value from these tables. Thus, when the test statistic is a 
Student’s t, the p value is usually reported as an interval, for example, p < 0.05 
or 0.025 < p < 0.05. 

Let us consider the value t = 1.55 that we computed in Example 7.4.1. When 
we enter Table E with 14 degrees of freedom, we find that 1.55 falls between 
1.345 and 1.7613. If H 0 is true, the probability of obtaining a value of t as large 
as or larger than 1.345 is 0.10. The probability of obtaining a value as large as 
or larger than 1.7613 is 0.05. Then the probability, if H 0 is true, of obtaining a 
value of t as large as or larger than 1.55 is somewhere between 0.10 and 0.05. 



That is, for this test, 0.10 > p > 0.05. Figure 7.4.1 shows the calculation of the 
p value for Example 7.4.1. 


Paired 

Observations 


Chapter 6 showed that in statistical inference, interest may focus on a population 
mean difference. It explained the construction of a confidence interval for the 
mean difference by using paired sample observations. We may also test hypotheses 
about the mean difference fx d in a manner like that described in this section. We 
sometimes call such a test a paired comparisons test or a paired difference test. 
Recall that the data for analysis consist of sample differences d t = jc lf - x 2i , 
where x u and x 2i are observations taken on the zth pair of subjects under condition 
1 and condition 2, respectively. 

We may formulate any one of the following pairs of hypotheses: 


1. H q : fx d — o, 

2. H 0 : n d < n M , 

3. H 0 : n d > n M , 


H{. n<i ^ fJ-M 
Hi - V-d 


When the population is normally distributed and the true variance of the dif¬ 
ference is known, the test statistic is 


z 



( 7 . 4 . 2 ) 


When the variance is unknown, the test statistic is 

t = d - fijp 

Sd 


( 7 . 4 . 3 ) 


In practice, the most frequently used value of p, d0 is 0. 

EXAMPLE 7.4.2 Nine pairs of salespeople are matched as to age, years of experi¬ 
ence, level of initiative, and other variables. One member of each pair is randomly 
assigned to a training course taught by Method A. The other is assigned to the 
same type of training course taught by Method B. At the end of the course, each 
salesperson is given an examination to test retention of the material presented. 
Table 7.4.1 shows the results. 
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TABLE 7.4.1 

Scores made by 
nine pairs of 

Pair 

Method A 

Method B 

d, = (x A i ~ 

1 

90 

85 

5 

salespersons, each 

2 

95 

88 

7 

member of which 

3 

A 

87 

87 

0 

-1 

8 

was trained by a 

4 

5 

85 

90 

86 

82 

different method 

6 

94 

82 

12 


7 

85 

70 

15 

' 

8 

88 

72 

16 


9 

92 

80 

12 


The investigator wishes to know whether Method A is better than Method B. 
If the methods are equally effective, we would expect, in the long run, to observe 
an equal number of differences above and below 0. That is, we would expect the 
true mean difference p d to be 0. If, however, Method A is better than Method 
B, we would expect, in the long run, to find that Method A scores are higher than 
Method B scores. In this case (i d , the mean of all = jc a/ - x Bi , will be greater 
than 0. The investigator in this example is asking whether this is so. 

We carry out the hypothesis test by means of the following familiar steps. 

1. Statement of the hypotheses. Since, in Table 7.4.1, d, = x Ai - x Bn we have 
the following hypotheses: 


H o :p d <0, H ] :tx d > 0 


2. Identification of the test statistic and its distribution. Assume that the popu¬ 
lation of differences is approximately normally distributed. Then the test statistic 
is given by Equation 7.4.3. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. H x implies a one-sided test, with the rejection 
region in the upper tail of the distribution of t. Since there are 9 paired observa¬ 
tions, we have 9-1=8 degrees of freedom, and the critical value of t is 
1.8595. If the computed t is greater than or equal to 1.8595, we reject H 0 . 
Otherwise we do not reject H 0 . 

5. Collection of the data and performance of the calculations. From the data of 
Table 7.4.1, we compute 

J , 54 , t±2±__±J2 _ 74 = 8 2 
n 9 9 


s<i = 


j nldj - (Ld,) 2 3 4 5 /9(5 2 + 7 2 + ♦ • • 12 2 ) - (74) 2 


n(n — 1) 


(9)(8) 


= 6.12 


6.12 


S7 ‘ \/n V9 


2.04 


By Equation 7.4.3, we compute the following value of the test statistic: 




8.2 - 0 
2.04 


4.02 


6. Making the statistical decision. Since the computed value of t exceeds the 
critical value of t , we reject H 0 . 

7. Making the administrative decision. Since we reject H 0 , we conclude that on 
the basis of these data, Method A instruction is superior to Method B. Since 
4.02 > 3.3554, p < 0.005. 


Remember that this hypothesis-testing procedure rests on the assumption that 
the distribution of differences is at least approximately normal. All is not lost, 
however, if this assumption is not tenable. We may resort to one of the other two 
alternatives. If the sample size is equal to or greater than 30, we can use the 
procedures of Section 7.5 regardless of the form of the population of differences. 
If the population of differences is not at least approximately normally distributed, 
and it is not possible to draw a large sample, we may use a test known as the 
sign test. This test, which does not depend on the functional form of the parent 
population, will be discussed in Chapter 12. 

Carry out the seven-step hypothesis-testing procedure at the indicated level of significance 
and compute the p value for each test. 

7.4.1 The mean operating temperature of a heat-sensing device, according to the manu¬ 
facturer, is 190°F. A mean and standard deviation of 195° and 8°, respectively, are com¬ 
puted from the operating temperatures of a random sample of 16 devices. Do these data 
provide sufficient evidence to indicate that the mean operating temperature is higher than 
claimed. Let a = 0.05, and assume that operating temperatures are approximately nor¬ 
mally distributed. 

7.4.2 A petroleum company has developed a gasoline additive that it feels will improve 
gasoline mileage. In order to get information to support the planned marketing program, 
the firm hires a testing organization to conduct a paired-comparisons test involving 16 
pairs of cars. Each pair is identical with respect to make, model, engine size, and other 
relevant characteristics. One car of each pair is randomly selected and driven over a test 
course using gasoline with the additive. The other car of the pair is driven over the same 
course using a comparable gasoline without the additive. The mileage per gallon on the 
test course is shown here for all cars tested. Use the difference in gasoline mileage for a 
pair of cars as the variable of interest. Do the data provide sufficient evidence to indicate 
that the additive does increase gas mileage? Let a = 0.05. 


With 

Pair # additive (X|) 


Without 

additive ( X 2 ) Pair # 


With Without 

additive (Xfi additive ( X 2 ) 


1 

17.1 

16.3 

2 

12.7 

11.6 

3 

11.6 

11.2 

4 

15.8 

14.9 

5 

14.0 

12.8 

6 

17.8 

17.1 

7 

14.7 

13.4 

8 

16.3 

15.4 


9 

10.8 

10.1 

10 

14.9 

13.7 

11 

19.7 

18.3 

12 

11.4 

11.0 

13 

11.4 

10.5 

14 

9.3 

8.7 

15 

19.0 

17.9 

16 

10.1 

9.4 




7.4.3 A study is conducted to investigate how effective street lighting placed at various 
locations is in reducing automobile accidents in a certain town. The following table shows 
the median number of nighttime accidents per week at 12 locations one year before and 
one year after the installation of lighting. Do these data provide sufficient evidence to 
indicate that lighting does reduce nighttime automobile accidents? Let a = 0.05. 


Location ABCDEFGH IJKL 

No. before 812 5 4 6 3 4 3 2 6 6 & 

No. after 53214224354 3 



7.4.4 A random sample of 25 hamburger patties sold by a fast-food restaurant yields a 
mean weight of 3.8 ounces with a standard deviation of 0.5 ounce. Can we conclude from 
these data that the population mean is less than 4 ounces? Let a = 0.05. Weights of 
hamburger patties are approximately normally distributed. 

7.4.5 The following are the weights of a random sample of 10 employees working in the 
shipping department of a wholesale grocery firm: 154, 154, 186, 243, 159, 174, 183, 163, 
192, 181. On the basis of these data, can we conclude that the firm’s shipping department 
employees have a mean weight greater than 160 lb? Let a = 0.05. 

7.4.6 A hospital administrator states that emergency-room charges for a certain procedure 
must average at least $25 if the hospital is not to lose money on its emergency service . 
The hospital charged the following amounts for treating a sample of patients with the 
procedure in the emergency room during a one-year period (rounded to the nearest dollar): 
26, 20, 33, 25, 27, 30, 23, 27, 22, 38, 51, 60, 38, 56, 31. On the basis of these data, 
can we conclude at the 0.01 significance level that the mean charge for the sampled 
population of patients is greater than $25? 


7,5 THE MEAN OF A POPULATION THAT IS NOT 
NORMALLY DISTRIBUTED 

Needless to say, not all populations are normally—or even approximately nor¬ 
mally—distributed. Suppose that the sample on which a hypothesis test is based 
has been drawn from a population that is not normally distributed. If the sample 
is large (say, n > 30), we take advantage of the central limit theorem and use 

= X - Mo 
or/Vw 

as the test statistic. If we do not know the standard deviation of the population, 
we use the sample standard deviation as an estimate. We reason that the large 
sample, necessary for the central limit theorem to apply, will yield a satisfactory 
estimate of cr. 

EXAMPLE 7.5.1 A market research firm is interested in the amount that households 
of a certain town spend on groceries each week. The firm believes that the average 
amount spent per household each week is less than $40. A random sample of 100 
households yields a mean of $38 and a standard deviation of $10. Do these data 
support the firm’s belief? 



We can use the results of a hypothesis test to answer the question. 

1. Statement of the hypotheses 

H 0 \ jjl > $40, H { \ p < $40 

2. Identification of the test statistic and its distribution. The functional form of 
the population is not specified. However, since the sample size is large, we know 
that the sampling distribution of x is at least approximately normally distributed 
because of the central limit theorem. If a were known, the test statistic would be 

x ~ Mo 
{ t/Vw 

However, since the sample size is large, it ought to yield a satisfactory estimate 
of cr . The test statistic we compute, then, is 

= * ~ 
s/s/n 

We assume that this statistic will follow a normal distribution well enough for us 
to use a value from the standard normal distribution as the critical value of the 
test statistic. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. If the computed value of z is less than or equal 
to - 1.645, we reject H 0 . 

5. Collection of the data and performance of the calculations. From the infor¬ 
mation given, the computed value of the test statistic is 

$38 - $40 

z io/VIoo 

6. Making the statistical decision. Since -2.0 < - 1.645, we reject H 0 . 

7. Making the administrative decision. Since we reject H 0 , we conclude that the 
data support the firm’s belief. For this test, p = 0.0228. 


Exercises 



Carry out the seven-step hypothesis-testing procedure at the indicated level of significance 
and compute the p value for each test. 

7.5.1 An accountant for a certain firm has been told that in her section of the country the 
average weekly salary of typists is $175. She wants to know whether she should doubt 
this information. She calls on you for help. You decide to use the procedures of hypothesis 
testing. Use the seven steps of hypothesis testing to arrive at a decision. Let a = 0.05, 
use a sample of size 100, and use the sample standard deviation, s = $25, to estimate the 
population standard deviation. 

If your sample yielded an x of $170, what advice would you give the accountant? 

7.5.2 A real-estate agent claims that the average value of homes in a certain neighborhood 
is greater than $45,000. A random sample of 36 homes has a mean value of $48,000 and 
a standard deviation of $12,000. Do these data support the agent’s claim at the 0.05 level 
of significance? 





7.5.3 The manager of a shopping mall hypothesizes that cars in the parking lot remain 
there, on the average, more than 90 minutes on weekends. A random sample of 100 cars 
arriving on weekends yields a mean parking time of 96 minutes with a standard deviation 
of 30 minutes. Do these data provide sufficient evidence to support the manager’s conten¬ 
tion? Let a = 0.05. 

7.5.4 An industrial psychologist who serves as consultant to many electronics firms has 
accused production supervisors of promoting unskilled assembly-line employees to a cer¬ 
tain job for which they have no aptitude. A random sample of 40 of these employees 
yielded the following aptitude scores. 


73 

57 

96 

78 

74 

50 

65 

46 

63 

82 

92 

50 

42 

46 

86 

40 

57 

78 

66 

84 


42 

55 

44 

91 

91 

60 

97 

79 

85 

79 

81 

81 

83 

64 

76 

96 

94 

70 

70 

81 



The population variance is known to be 280, and the population of scores is not normally 
distributed. The production supervisors contend that the mean aptitude score of all the 
employees they promoted is greater than 60. Do these data provide sufficient evidence to 
support their claim? Let a = 0.05. Find the p value. 

7.5.5 A firm that makes roofing tar wants the percentage of impurities not to exceed an 
average of 3%. A random sample of 30 one-gallon cans yields the following percentages 
of impurities. 


3 3 1 1 0.5 2 2 4 5 4 5 3 1 3 1 

4 1 1 4 2 5 3 1 1 1 0.75 1.5 3 3 2 


* 


On the basis of these data can one conclude that the population mean is less than 3 percent? 
Let a — 0.01. 


7.6 THE DIFFERENCE BETWEEN THE MEANS OF TWO NORMALLY 
DISTRIBUTED POPULATIONS 

The difference between two population means often interests researchers and man¬ 
agers. If we do not have direct knowledge of the true parameters, we make an 
inference on the basis of sample data. Two independent random samples, one 
from each of two populations, provide data on which we base the inference. In 
the most common situation involving the difference between two population means, 
we want to find whether or not it is reasonable to conclude that the two are not 
equal. In this situation the test may be either one-sided or two-sided. In the latter 
case, the hypotheses are of the form 

Mi — M 2 = 0, Hp. fi] — fi 2^0 
and in the former case, the hypotheses are formulated as 

H 0 : /x, — /x 2 < 0, H { \ /jl 1 - f± 2 > 0 

or 

H 0 : Mi - M 2 ^ 0, 


Mi “ M 2 < 0 




Known Population 
Variances 


However, in these hypotheses we can replace 0 with any value of interest. For 
example, we might want to test the null hypothesis that p x - \x 2 = 10. 

We shall discuss hypothesis tests involving the difference between two popu¬ 
lation means under three different circumstances: (1) when sampling is from nor¬ 
mally distributed populations with known population variances; (2) when sampling 
is from normally distributed populations with unknown population variances; and 
(3) when sampling is from populations that are not normally distributed. The first 
two situations are discussed in this section. The third will be covered in Section 
7.7. 

For testing the difference between two population means when the populations 
are normally distributed and the population variances are known, the appropriate 
test statistic is based on 


t*i - x 2 ) - O, - p 2 ) 



This follows from our knowledge of the sampling distribution of x x - x 2 , the 
difference between two sample means. Neither the sample sizes nor the variances 
need be equal. 


EXAMPLE 7.6.1 Two procedures can be used to manufacture wire. Experience has 
shown that the tensile strengths that result from both procedures are approximately 
normally distributed. The standard deviation for Procedure 1 is 6 psi. For Pro¬ 
cedure 2 the standard deviation is 8 psi. Management wishes to know whether 
the mean tensile strengths of wire produced by the two methods are different. 
We can decide by means of the following hypothesis test. 

1. Statement of hypotheses 

H 0 : AM - p 2 = 0, H x \ (i { - fx 2 ^ 0 

2. Identification of the test statistic and its distribution. The test statistic is 
— ~x ) — 0 

========== which is distributed as the standard normal 

£i + £l 

n x n 2 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. Reject H 0 if the computed value of the test 
statistic is greater than or equal to + 1.96 or less than or equal to - 1.96. 

5. Collection of the data and performance of the calculations. A random sample 
of 12 pieces of wire made by Procedure 1 gives a mean of 40 psi. A random 
sample of 16 pieces made by Procedure 2 yields a mean of 34 psi. From these 
data we can compute the following value of the test statistic: 



Unknown 

Population 

Variances 


(40 - 34) - 0 



2.27 


6. Making the statistical decision. Since 2.27 > 1.96, we reject H 0 . 

7. Making the administrative decision. On the basis of the sample data, we con¬ 
clude that the two population means are different. That is, we conclude that the 
two procedures, on the average, do not yield wire with the same tensile strength. 
For this test, p = 2(1 - 0.9884) = 2(0.0116) - 0.0232. 


In testing a hypothesis as to the difference between the means of two normally 
distributed populations when the population variances are unknown, we distin¬ 
guish between two cases: (1) the case in which the population variances are equal, 
and (2) the case in which they are not equal. We made the same distinction in 
Chapter 6 when we discussed interval estimation of the difference between two 
population means. Let us consider each case separately. 

Equal Variances Suppose that the population variances, though unknown, are 
equal. Then the correct test statistic for testing hypotheses about the difference 
between the means of two normally distributed populations is based on 


where 


(*i ~ * 2 ) ~ Oi ~ M 2) 


si 

si 

- 

+ -£ 

V 

n 2 


2 _ K ~ 0^1 + ( n 2 ~ 1 )»y2 
p n l + n 2 — 2 


is the pooled estimate of the common population variance. 


(7.6.2) 


EXAMPLE 7.6.2 Two machines are used in the making of steel rings. The quality- 
control department asks whether it should conclude that Machine 1 is producing 
rings with a larger inside diameter than Machine 2. Assume that the diameters 
are approximately normally distributed and cr] = cr\. 


1. Statement of hypotheses 

H 0 : - p 2 — 0, H x : - p 2 > 0 

2. Identification of the test statistic and its distribution. Under the assumption 
that the two populations are normally distributed with equal variances, the appro¬ 
priate test statistic is 





which is distributed as Student’s t with n l + n 2 — 2 degrees of freedom. 

3. Specification of the significance level. Let a — 0.01. 

4. Statement of the decision rule . If the computed value of t is greater than or 
equal to the critical t for n } + tu - 2 degrees of freedom, and a = 0.01, reject 

H () . 

5. Collection of the data and performance of the calculations. A random sample 
of 10 rings from Machine 1 and 15 rings from Machine 2 gives the following 
results: x, = 1.051, x 2 = 1.036, s* = 0.000441, and s\ = 0.000225. The 
pooled estimate of the common population variance is 


, 9(0.000441) + 14(0.000225) 

— - 

p 23 


0.000310 


The value of the test statistic that we may compute from the sample data is 


, = (1.051 - 1.036) - 0 = 2 Q9 

/o. 000310 0.000310 

y] 10 + 15 

6. Making the statistical decision. Since the computed t of 2.09 is less than the 
critical t of 2.5, we cannot reject the hypothesis. 

7. Making the administrative decision. Since we did not reject H 0 , the quality- 
control department cannot conclude from the test results that Machine 1 is pro¬ 
ducing steel rings with a larger inside diameter than Machine 2. Since 
2.0687 < t < 2.500, then 0.025 > p > 0.01. 


Unequal Variances When the population variances are not equal, there is, of 
course, no basis for pooling s] and s\. The test statistic, then, is based on 



which is distributed approximately as Student’s t with degrees of freedom df 
given by Equation 6.6.3. 


EXAMPLE 7.6.3 In Example 7.6.2, suppose that we do not know whether the 
population variances are equal, and that we are unwilling to assume that they are 
equal. The critical value of the test statistic, then, will be t for a - 0.01 (one¬ 
sided test), with degrees of freedom equal to 



df' 


[(0.000441/10) + (0.000225/15)] 2 
(0.000441/10) 2 (0.000225/15) 2 


= 16.67 


17 


10 15 

Table E shows the critical value to be 2.567. 

The value of the test statistic that we can compute from the sample data is 

(1.051 - 1.036) - 0 


t’ = 


1 0.000441 0.000225 


= 1.95 


yj 10 15 

Since 1.95 < 2.561, we cannot reject H {) on the basis of this test. 


Carry out the seven-step hypothesis-testing procedure at the indicated level of significance 
and compute the p value for each test. 

7.6.1 A textile manufacturer can buy a certain type of yarn from one of two vendors. The 
vendors’ products appear to be comparable in all respects except price and, possibly, 
breaking strength. The manufacturer will buy from Vendor 1 (whose price is lower) unless 
there is reason to believe that Vendor 1 ’s product has a lower mean breaking strength than 
Vendor 2’s. Random samples are drawn from the two vendors’ stocks with the following 
results. Assume that breaking strengths are approximately normally distributed, (a) Based 
on an appropriate hypothesis test with a = 0.05, would you advise the manufacturer to 
buy the cheaper yam? Assume that the population variances are equal, (b) Repeat part (a) 
under the assumption that the population variances are not equal. 

Vendor 1 n - 10 x = 94 s 2 = 14 

Vendor 2 n = 12 x = 98 s 2 = 9 


7.6.2 The following data are based on random samples taken from two shifts in a certain 
factory. The variable of interest is the length of time needed to do a certain task. Do these 
data provide sufficient evidence to indicate that the average time needed on Shift 2 is less 
than that on Shift 1? Let a = 0.05. Specify all assumptions that you have to make in 
order to validate your procedure. 


Shifl: 1 n = 10 x - 26.1 s 2 - 144 

Shift 2 n — 8 x = 17.6 s 2 - 110 


7.6.3 A manufacturer of a sleeping medicine is comparing the effectiveness of a new 
formula, B, with Formula A which is now on the market. For three nights 25 subjects try 
Formula B and 25 subjects in an independent sample try Formula A. The variable of 
interest is the average number of additional hours of sleep (compared with nights when 
no drug is taken) the subjects get for the three nights. The results are as follows. Do these 
data provide sufficient evidence to indicate that Formula B is better than Formula A? Let 
a = 0.01. 


Medicine 

x 

s 2 


A 

1.4 

0.09 


B 

1.9 

0.16 



7.6.4 An industrial psychologist feels that a big factor in job turnover among assembly¬ 
line workers is the individual employee’s self-esteem. She thinks that workers who change 
jobs often (Population A) have, on the average, lower self-esteem, as measured by a 
standardized test, than workers who do not (Population B). To determine whether she 
could support her belief with statistical analysis, she draws a simple random sample of 
employees from each population, and gives each a test measuring self-esteem. The results 
are as follows: 


Group A 60 45 42 62 68 54 52 55 44 41 

Group B 70 72 74 74 76 91 71 78 76 78 83 50 52 66 65 53 52 


The psychologist believes that the relevant populations of scores are normally distributed, 
with equal, although unknown, variances. At the 0.01 level of significance, what should 
she conclude? What use can she make of her findings? 

7.6.5 In a university finance class, an argument arose over the contention of some members 
of the class that men have a better knowledge of the stock market than women. To settle 
the argument, the instructor gave a test to measure knowledge of the stock market to a 
random sample of 15 male students and an independent random sample of 15 female 
students. The results were as follows: 


Women: 73 96 74 55 91 50 46 82 43 79 79 50 46 81 83 

Men: 57 78 42 44 91 65 63 60 97 85 92 42 86 81 64 


Can one conclude on the basis of these data that male students, on the average, have a 
better knowledge of the stock market than female students? Let a = 0.05. What assump¬ 
tions are necessary? 


7.7 THE DIFFERENCE BETWEEN THE MEANS OF TWO POPULATIONS 
NOT NORMALLY DISTRIBUTED 


When samples are drawn from nonnormally distributed populations, we can use 
the results of the central limit theorem if the sample sizes are large. This lets us 
use normal theory, since the sampling distribution will be approximately normally 
distributed. 

The appropriate test statistic for hypothesis-testing purposes is based on 


= C*i ~ *2) ~ (Mi ~ ^2) 


/- + 

£2 

V n i 

n 2 


If the population variances are unknown, we use the sample variances as estimates. 
We do not pool the sample variances, however, since we don’t need to assume 
equality of population variances when we use the z statistic. 


EXAMPLE 7.7.1 A market-research firm wishes to know if it can conclude that the 
mean number of hours of television viewing per week by families in a certain 







type of community (Type A) is less than that in another type of community (Type 
B). Independent random samples give the following information: 


Type A Type B 

Number of families interviewed 100 75 

Average number of hours 

of television viewing per week 18.50 27.25 

Standard deviation 10 14 


The results of the following hypothesis test will help answer the question. 

1. Statement of hypotheses 

H 0 : n A > mb, H t : fx A < n B 

2. Identification of the test statistic and its distribution. The functional forms of 
the populations are not given. However, since the sample sizes are large, we rely 
on the central limit theorem and assume that the statistic x A — x B is approximately 
normally distributed. If cr\ and a B were known, the appropriate test statistic would 
be given by Equation 7.7.1. Since these parameters are unknown, we compute 

which is approximately normally distributed 


3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. If the computed value of the test statistic is less 
than or equal to - 1.645, reject H 0 . 

5. Collection of the data and performance of the calculations. From the data 
given in the problem statement, we may compute the following value of the test 
statistic: 

z = (IMP - 27.25) - 0 = _ 4 6Q 
/100 196 

y] 100 + "75 

6. Making the statistical decision. Since —4.60 < — 1.645, we reject H 0 . 

7. Making the administrative decision. Since we reject H 0 , we can conclude that 
p A is less than p B . For this test, p < 0.001. 

Here we have seen that when two populations are not normally distributed, we 
can test the hypothesis that the two population means are equal if the samples are 
large enough to apply the central limit theorem. When the data consist of small 
samples drawn from populations that are not normally distributed, we need a 
hypothesis test appropriate for such a situation. Chapter 12 presents such a test, 
the Mann-Whitney test. 




Carry out the seven-step hypothesis-testing procedure at the indicated level of significance 
and compute the p value for each test. 

7.7.1 A paper manufacturer is thinking of buying one of two tracts of timberland. The 
size of the trees on each tract is important. Measurements of trunk diameter for a random 
sample of 50 trees from each tract gave the following results. Do these data provide 
sufficient evidence at the 0.05 level of significance to indicate that trees on Tract B are, 
on the average, smaller than trees on Tract A? 

Tract A x = 28.25" s 2 = 25 
Tract B x = 22.50" s 2 - 16 


7.7.2 An analyst is studying the advertising practices of two types of retail firms. One 
variable is the amount spent on advertising during the preceding year. An independent 
random sample is drawn from each type of firm, with the following results. Can we 
conclude from these data that Type A firms spent more for advertising, on the average, 
than Type B firms? Let a = 0.05. 


Type A n = 60 x = $14,800 s 2 = 180,000 
Type B n = 70 x = $14,500 s 2 - 133,000 


7.7.3 A random sample of 100 families from Community A and a random sample of 150 
families from Community B yield the following data on length of residence in current 
home. Do these data provide sufficient evidence to indicate that, on the average, families 
in Community A have been living in their current homes for less time than families in 
Community B have? Let a = 0.05. 


Community A x = 33 months s 2 = 900 
Community B x = 49 months s 2 = 1,050 


7.7.4 A researcher interviews a random sample of male executives and a random sample 
of married, unemployed, middle-class adult females regarding their exposure to advertising 
through radio, television, newspapers, and magazines. One variable is the number of ads 
to which each subject is exposed on a typical weekday. The results are shown in the 
following table. Do these data provide sufficient evidence to indicate that, on the average, 
the sampled female population is exposed to more ads than the sampled population of male 
executives? Let a = 0.01. 

Mean number of ads Standard 

Group n to which exposed deviation 


Male executives 100 200 50 

Unemployed females 144 225 60 

7.7.5 A manufacturer of electrical wire wants to compare two types of wire with respect 
to resistance per unit length. Thirty specimens of wire 1 and 35 specimens of wire 2 yield 
the following measurements in ohms x 10 2 . 


Wire 1 

55.2 

53 


57.4 

53 


53.1 

50 

Wire 2 

46.9 

50 


48.2 

47 


51.1 

50 


.5 52.3 54.1 

.9 58.1 50.6 

.6 53.1 59.7 

.6 47.3 48.0 

.4 48.1 49.4 

.9 49.5 49.7 


52.4 

50.5 

53.5 

59.4 

51.8 

50.8 

49.2 

48.4 

48.5 

47.4 

49.7 

49.1 

51.4 

48.1 

49.7 


46.9 

52.9 

57.1 

56.9 

56.3 

59.1 

48.6 

48.2 

50.2 

49.3 

50.3 

50.8 

50.9 

48.6 



55.7 

51.2 

55.2 

52.7 

56.1 

58.2 

47.2 

50.3 

49.1 

48.3 

47.7 

48.5 



Can we conclude on the basis of these data that the populations differ with respect to mean 
resistance? Let a = 0.05. 


7.7.6 A manufacturer wants to compare the viscosity of two brands of motor oil. Thirty- 
two randomly selected specimens of each brand are analyzed, with the following results. 
(The data are coded for ease of computation.) 


Brand A 

13 

21 

60 

35 

38 

10 

36 

24 

35 

35 

45 

19 

42 

11 

35 

39 

25 


17 

51 

25 

52 

25 

11 

11 

55 

44 

25 

41 

16 

47 

50 

18 



Brand B 

46 

52 

66 

65 

71 

67 

47 

48 

58 

42 

66 

69 

60 

80 

45 

47 

69 


75 

43 

46 

74 

73 

43 

70 

51 

72 

65 

45 

76 

48 

56 

64 




Can one conclude on the basis of these data that the mean viscosity of the two brands 
differs? Let a = 0.05. 



7.8 TESTING A HYPOTHESIS ABOUT A POPULATION PROPORTION 

We come now to hypothesis testing when the parameter of interest is the proportion 
of elements that have a given characteristic. We call the elements with the char¬ 
acteristic “successes,” and designate the proportion of successes by p. 

Chapter 4 showed that the binomial probability distribution is the correct model 
when we are considering the number of elements out of a total of n elements that 
have a certain characteristic. 

When n is large, the work it takes to find the probability of some specified 
number of successes, using the binomial formula, is less than appealing. However, 
as we pointed out in Chapter 5, when np and n( 1 — p) are both greater than 5, 
the binomial distribution may be approximated by the normal distribution. When 
n/N is also < 0.05, the appropriate test statistic for testing hypotheses about 
population proportions is 



where p 0 is the hypothesized proportion, q 0 = 1 - p 0 , and p is the sample 
proportion. This statistic is distributed approximately as the standard normal. If 
n is large relative to N, we use a finite population correction in Equation 7.8.1. 
As we noted, in the examples and exercises of this chapter, we shall assume that 
n is small relative to N, so that we can ignore the correction factor. We use p 0 in 
the denominator of Equation 7.8.1 rather than p because we assume H 0 to be true 
while we are conducting the test. 

EXAMPLE 7.8.1 The president of a certain firm, concerned about the safety record 
of the firm’s employees, sets aside $15,000 a year for safety education. The firm’s 
accountant believes that more than 75% of similar firms spend more than $15,000 
a year on safety education. When the president asks the accountant for evidence 
to support this belief, the accountant responds with the following hypothesis test. 


L 



1. Statement of hypotheses 

H 0 : p < 0.75, H x : p > 0.75 

2. Identification of the test statistic and its distribution. The accountant decides 
to obtain information from a simple random sample of 60 firms. This sample is 
large enough to enable the accountant to use Equation 7.8.1. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. If the computed value of the test statistic is 
greater than or equal to 1.645, reject H 0 . 

5. Collection of the data and performance of the calculations. Of the 60 firms, 
50 state that they spend more than $15,000 per year on safety education. Therefore 
p = 50/60 = 0.83, and the computed value of the test statistic is 

0.83 - 0.75 

z = —=============== = 1.43 

/ (0.75X0.25) 

V 60 

6. Making the statistical decision. Since 1.43 < 1.645, we cannot reject the null 

hypothesis. t 

7. Making the administrative decision. Even though the sample proportion is 
greater than 0.75, the test results do not support the accountant’s hypothesis. We 
should conclude that the true proportion with the characteristic of interest may be 
less than or equal to 0.75. For this test, p = 1 - 0.9236 = 0.0764. 

Exercises Carry out the seven-step hypothesis-testing procedure at the desired level of significance 

and compute the p value for each test. 

7.8.1 A self-help club is considering the promotion of a home study course leading to a 
high school diploma for members who have not finished high school. The president of the 
club thinks that fewer than 25% of the members have not completed high school, and 
would like to support this belief with an appropriate hypothesis test. Of a random sample 
of 200 members, 42 indicate that they have not completed high school. Do these data 
support the president’s belief at the 0.05 significance level? 

7.8.2 A college with an enrollment of approximately 10,000 students wants to build a 
new student parking garage. The administration feels that more than 60% of the students 
drive cars to school. If, in a random sample of 250 students, 165 indicate that they drive 
a car to school, is the administration’s position supported? Let a = 0.05. 

H 7.8.3 The head accountant of a company is concerned over the clerical errors on outgoing 
invoices, and believes that more than 20% of them contain at least one error. In a random 
sample of 400 invoices, 100 are found to contain at least one error. Do these data support 
the accountant’s belief? Let a - 0.05. 

7.8.4 In a study of job turnover, a researcher interviews a random sample of 200 top- 
level employees who have changed jobs during the past year. Thirty state that they changed 
jobs because they didn’t see much prospect for advancement on their old jobs. Do these 
data provide sufficient evidence at the 0.05 level of significance to indicate that fewer than 
20% of this type of employee changes jobs for this reason? 





7.9 TESTING A HYPOTHESIS ABOUT THE DIFFERENCE BETWEEN TWO 
POPULATION PROPORTIONS 


The manager or the researcher is often interested in the difference between two 
population proportions. We can test the null hypothesis that the difference between 
two population proportions is equal to any given value. However, the hypothesis 
we find most often in practice is that the difference is 0. The correct test statistic 
for testing hypotheses about the difference between two population proportions is 
based on 


(P i ~ Pi) - (Pi ~ Pi) 
/ piU ~ Pi) + PzU ~ Pi) 
V n x n 2 


(7.9.1) 


where the samples are independent simple random samples. Since p, and p 2 , the 
true population proportions, are unknown, we must estimate them. The best avail¬ 
able estimates usually are the sample proportions. 

The null hypothesis that p } - p 2 = 0 is equivalent to the hypothesis that the 
two population proportions are equal. We may use this as justification for com¬ 
bining the results of the two samples. We thus obtain a pooled estimate of the 
hypothesized common proportion, which is given by 


p = X - 

n i + n 2 


(7.9.2) 


where jc, and jc 2 are the number in the first and second sample, respectively, with 
the characteristic of interest. We use this pooled estimate of p = p x = p 2 to 
compute the following standard error: 


p) , p0 - ~P) 
n 7 


The test statistic, then, is 



which is distributed approximately as the standard normal if the null hypothesis 
is true. 

Suppose that n ] and n 2 are fairly close in size, and neither p x nor p 2 is too close 
to 0 or 1. Then the results we obtain by pooling will, as a rule, not differ very 
much from the results we obtain when the data are not pooled. It is never wrong 
to pool the data under H 0 : p { = p 2 . Since in some cases it may make a difference, 
it is advisable always to pool the data when n x and n 2 are unequal. 


EXAMPLE 7.9.1 A researcher studies the grocery-shopping habits of city residents 
Interviews with the principal shopper in each of 400 households reveal the fol 


lowing: Of 225 shoppers with rural backgrounds and 175 shoppers with urban 
backgrounds, 54 and 52, respectively, state that they do most of their grocery 
shopping at chain stores. We want to decide, on the basis of this sample, whether 
or not the two groups differ with respect to where they do most of their grocery 
shopping. 

1. Statement of hypotheses 


H 0 :p,=p 2 , Hp. p l ¥= p 2 

2. Identification of the test statistic and its distribution. The test statistic is given 
by Equation 7.9.4. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. If the computed value of the test statistic is 
greater than or equal to + 1.96 or less than or equal to - 1.96, we reject H 0 . 

5. Collection of the data and performance of the calculations. From the infor¬ 
mation given in the problem statement, we compute the value of the test statistic 
as follows. By Equation 7.9.2, we have 


54 + 52 
225 + 175 


155 = 

400 


The test statistic, by Equation 7.9.4, is 

(0.240 - 0.297) - 0 

Z — i -- — 1 .Zo 

/ (0.265)(0.735) (0.265)(0.735) 

yj 225 + 175 

6. Making the statistical decision. Since — 1.28 > — 1.96, we do not reject the 
null hypothesis. 

7. Making the administrative decision. On the basis of the data given, we con¬ 
clude that the two proportions may be equal. These data do not allow us to accept 
the alternative hypothesis. For this test, p = 2(0.1003) = 0.2006. 

Let us now test the hypothesis using the unpooled estimate of the standard error. 
That is, let us base the test statistic on Equation 7.9.1. 

(0.240 - 0.297) - 0 

z = — . ' .— = = -1.27 

/ (0.240X0.760) (0.297X0.703) 

y . 225 + ; 175 

We see that, in the present example, pooling has had a negligible effect. 

Sometimes the hypothesized difference between population proportions is other 
than 0. In these cases, it is not correct to pool the sample data. The following 
example shows this. 


EXAMPLE 7.9.2 A market researcher believes that the proportion of households in 
Area A with two or more cars exceeds by more than 0.05 the proportion of 
households in Area B with two or more cars. 

To see whether the facts support this hypothesis, the researcher conducts a 
survey among Area A and Area B households, with the following results. 

Number of households with 

Area Sample size two or more cars 


A n A = 150 113 

B n B = 160 104 


1. Statement of hypotheses 

H q : Pa ~ Pb- 0-05, Hp. p A - p n > 0.05 

2. Identification of the test statistic and its distribution. The relevant statistic is 
Pa ~~ Pb> which is considered to be approximately normally distributed (since n x 
and n 2 are large). If H 0 is true, the mean of the distribution is 0.05 or less (we 
test at 0.05). The test statistic is as follows: 

z = (Pa - Pb) Z °' 05 

I p a(1 ~ Pa) + PbU ~ Pb) 

V «A «B 

When H 0 is true, the test statistic is distributed approximately as the standard 
normal. 


3. Level of significance. Let a = 0.05. 

4. Statement of the decision rule. If the computed value of the test statistic is 
greater than or equal to 1.645, reject H 0 . 

5. Collection of the data and performance of the calculations. From the sample 
data, we compute p A = 113/150 = 0.75 and p B = 104/160 = 0.65. 

The standard error is 


(0.75X0.25) (0.65X0.35) 


50 


160 


= 0.05 


which allows us to compute 

(0.75 - 0.65) t 0.05 , _ 

z -——- = 1.00 

0.05 

6. Making the statistical decision. Since the computed z of 1.00 is less than 1.645, 
we do not reject H 0 . 

7. Making the administrative decision. We may not conclude, on the basis of 
these data, that the market researcher’s hypothesis is true. For this test, we have 
p = l - 0.8413 = 0.1587. 




Exercises 





Carry out the seven-step hypothesis-testing procedure at the desired level of significance 
and compute the p value for each test. 

7.9.1 A firm that makes carpeting is seeking a material that can withstand temperatures 
of up to 250°F. Two materials, one a natural material, the other a synthetic (and cheaper) 
material, are equally satisfactory in all respects except, possibly, heat tolerance. Simple 
random samples of 225 specimens of each of the two materials are tested for this char¬ 
acteristic. The samples are independently drawn. Thirty-six specimens of the natural ma¬ 
terial and 45 of the synthetic material fail at temperatures below 250°F. Can we conclude 
from these data that the two materials are different with respect to heat tolerance? Let a 
= 0.05. 

7.9.2 A large corporation finds that 63% of the 150 salespeople who have never had a 
self-improvement course would like such a course. The firm did a similar study 10 years 
before. Then only 58% of 160 salespeople wanted a self-improvement course. At the 0.05 
level of signifieance, test the null hypothesis that salespeople are no more eager for self- 
improvement courses this year than they were 10 years ago. The groups are assumed to 
constitute two independent simple random samples. 

7.9.3 A simple random sample of 200 Type A industrial firms shows that 12% of them 
spend more than 1% of their total sales for advertising. An independent simple random 
sample of the same size from Type B firms shows that 15% spend more than 1% of their 
sales for advertising. Let a = 0.05, and test 


H 0 : Pq ^ p A against the alternative Hp. p B > p A 


7.9.4 You conduct a study of the leisure-time activities of business people in a certain 
city. A simple random sample of 400 salespersons and an independent simple random 
sample of 400 business people not engaged in selling yield the following results: 288 
salespersons and 260 nonsales business people say that their leisure-time activities are 
mainly sports-oriented. Would you conclude on the basis of these data that a smaller 
proportion of nonsales business people than salespersons spend their leisure time in sports- 
oriented activities? Let a = 0.05. 


7.10 TESTING A HYPOTHESIS ABOUT THE VARIANCE OF A 
NORMALLY DISTRIBUTED POPULATION 

Chapter 6 showed how to construct a confidence interval for the variance of a 
normally distributed population. We may use the same general principles to test 
a hypothesis about a population variance. The appropriate test statistic for testing 
H 0 : cr 2 = o"q is 


X 


2 


(n - 1 )s‘ 


cri 


(7.10.1) 


where s 2 is computed from a random sample of size n from a normally distributed 
population. When the null hypothesis is true, the test statistic is distributed as \ 2 
with n - 1 degrees of freedom. We may make both one-sided and two-sided 
tests. 


Again: This procedure requires that the sampled population be normally dis¬ 
tributed. Violation of this assumption can yield misleading results. 


Exercises 


EXAMPLE 7.10.1 Specifications for a certain type of steel plate state that the var¬ 
iance in weight shall not exceed 0.016 lb 2 . A random sample of 25 plates yields 
a variance of 0.025. Should we conclude from these data that specifications are 
not being met? Yes, we can, if we can reject the null hypothesis that the population 
variance is less than or equal to 0.016. 

We may use the seven-step hypothesis-testing procedure. 

1. Statement of hypotheses 

H 0 : a 2 < 0.016, H{. a 2 > 0.016 

2. Identification of the test statistic and its distribution. Equation 7.10.1 gives the 
test statistic. We assume that the population of weights is approximately normally 
distributed. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule. For a = 0.05 and 24 degrees of freedom, the 
critical value of \ 2 is 36.415. If the computed value of \ 2 is greater than or equal 
to 36.415, reject H 0 . 

5. Collection of the data and performance of the calculations. From the infor¬ 
mation given in the problem, we can compute the following value of the test 
statistic: 

24(0,025) 

* 0.016 

6. Making the statistical decision. Since 37.5 > 36.415, we reject H 0 . 

I, Making the administrative decision. The data indicate that the variance spec¬ 
ifications are not being met. 

For this test, 0.05 > p > 0.025, since 36.415 < 37.5 < 39.364. When the 
alternative hypothesis is two-sided, a complication arises in the calculation of the 
p value associated with a chi-square statistic. Since the chi-square distribution is 
not symmetric, it is not correct to double the one-sided p value, as we have done 
for two-sided tests based on symmetric distributions. For two-sided tests based on 
asymmetric distributions, we may report the one-sided p value accompanied by a 
statement indicating the direction of the observed departure from the null hypoth¬ 
esis. 

Carry out the seven-step hypothesis-testing procedure at the desired level of significance 
and compute the p value for each test. 

7,10.1 A simple random sample of size 21 from a normally distributed population gives 
a variance of 10. Test the null hypothesis that or 2 - 15 against the alternative that a 2 ¥=■ 
15. Let a = 0.05. 




7.10.2 The inside diameters of metal washers have a variance of 0.00005 in 2 or less when 
the process by which they are made is under control. A random sample of 31 washers 
taken from the assembly line yields a variance of 0.000061 in 2 . Do these data provide 
sufficient information to indicate that the process is out of control? Let a = 0.05. What 
assumption must we make in order to answer the question? 

7.10.3 The tensile strength of a synthetic fiber must have a variance of 5 or less before 
it is acceptable to a certain manufacturer. A random sample of 25 specimens taken from 
a new shipment gives a variance of 7. Does this provide sufficient grounds for the manu¬ 
facturer to refuse the shipment? Let a = 0.05 and assume that tensile strength of the fiber 
is approximately normally distributed. 


7.11 THE RATIO OF THE VARIANCES OF TWO NORMALLY 
DISTRIBUTED POPULATIONS 

The use of the t distribution in constructing confidence intervals and testing hy¬ 
potheses for the difference between two population means assumes that the pop¬ 
ulation variances are equal. We compute estimates of the population variances 
from samples taken from the two populations. The obvious question is: Are the 
observed differences between the sample variances indicative of a real difference 
in population variances? Or could they have come about because of chance alone 
when the population variances are, in fact, equal? Suppose that we examine the 
sample variances and conclude that the two population variances are not equal. 
We either discard the t test or use the modification discussed for unequal variances. 

Two machines may produce items that are equal with respect to the mean value 
of some critical measurement. However, the variability among items produced by 
one of the machines may be greater. We would like some method on which to 
base a decision as to whether this is likely to be so. This example shows a case 
in which we want to know whether or not two population variances are equal. 

Decisions about the equality of two population variances are based on the 
variance ratio or F test. From Chapter 6, recall that when certain assumptions 
are met, the quantity is distributed as F with n, - 1 numerator 

degrees of freedom and n 2 - 1 denominator degrees of freedom. Under the null 
hypothesis that cr\ — cr 2 , we assume that the hypothesis is true, and that the two 
variances cancel out. This leaves sf/s 2 * which follows the same F distribution. 
The test statistic, then, for testing H 0 : v\ — cr\ is 

F = ~\ (7.11.1) 

S 2 


For a two-sided test, we place the larger sample variance in the numerator. We 
then find the critical value of F for a/2 and the appropriate degrees of freedom. 
However, for a one-sided test, which of the two sample variances to put in the 
numerator is predetermined by the statement of the null hypothesis. For example, 
for the null hypothesis that a] < o\, the appropriate test statistic is F — s\/s \. 



The critical value of F is obtained for a (not a/2 ) and the appropriate degrees of 
freedom. Similarly, if the null hypothesis is that cr \ > cr|, the appropriate test 
statistic is F — s 2 /s 2 . In all cases, the decision rule is to reject the null hypothesis 
if the computed F is equal to or greater than the critical value of F. 


EXAMPLE 7.11.1 A company has two types of training programs for new employ¬ 
ees. New employees are alternately assigned to one or the other program. At the 
end of the training period, each is given the same examination. There are 22 in 
the first group and 25 in the second group. Assume that the populations are 
approximately normally distributed. A t test is used to test whether the mean 
scores of the two groups are significantly different. The variance for the first group 
is s 2 - 70.3. The variance for the second group is s\ - 225.5. Do these data 
cast doubt on the assumption of equal variances necessary for the valid use of the 
t test? 

We may use the results of a hypothesis test to help answer the question. 

1. Statement of hypotheses 

H 0 : or 2 — ct\, H x \ a 2 ^ ct 2 2 


2. Identification of the test statistic and its distribution. Equation 7.11.1 gives the 
appropriate test statistic, under the assumption that the samples came from nor¬ 
mally distributed populations. 

3. Specification of the significance level. Let a = 0.05. 

4. Statement of the decision rule . If the computed value of F is greater than or 
equal to the critical F for a = 0.05 and 24 and 21 degrees of freedom, we reject 
H 0 . Since we have a two-sided alternative, we select from the table the value of 
F that has a/2 = 0.025 of the area under the curve to its right. From Appendix 
Table G, we find the critical value of F to be 2.37. 

5. Collection of the data and performance of the calculations. From the infor¬ 
mation given in the problem statement, we compute the following value of the 
test statistic: 


225.5 

70.3 


3.21 


6. Making the statistical decision. Since 3.21 > 2.37, we reject H 0 and conclude 
that the assumption of equal population variances is not met. In other words, we 
feel that a variance ratio as large as that observed did not come about as a result 
of chance alone, but is the result of a false null hypothesis. 

7. Making the administrative decision . We should use the test statistic t'. 


Since we have a two-sided alternative hypothesis, and since the F distribution 
is asymmetric, we report the one-sided p value, along with a statement of the 
direction of the departure from H 0 . Since 3.21 is greater than 3.15, the value of 
F0.995 f° r 24 and 21 degrees of freedom, the one-sided p value is less than 0.005. 
Thus we can say that p < 0.005 (right-tail probability). 




Carry out the seven-step hypothesis-testing procedure at the desired level of significance 
and compute the p value for each test. 

7.11.1 A person is considering the use of the t test to test the difference between two 
means. Two samples of size 16 yield variances of 28.5 and 9.5, respectively. Do the data 
indicate that the t test is inappropriate on the basis of the assumption of equality of 
population variances? Let a = 0.05. 

7.11.2 A pilot sample (n = 25) yields a variance of 96.0, which is used in determining 
the sample size needed for a survey. The variance computed from the sample survey data 
(n — 121) is 144. Do these results indicate that the estimate of the pilot-sample variance 
may have been too low? Let a = 0.05. 

7.11.3 A study is designed to compare two drugs for relieving tension among employees 
in stressful jobs. A medical team collects data on levels of tension of the subjects in two 
treatment groups at the end of the first two months of treatment. The variances computed 
from the sample data are sf = 2916 and sf, = 4624. There are 8 subjects in each group. 
At the 0.05 level of significance, do these data provide sufficient evidence to suggest that 
the variability in tension levels is different in the two populations represented by the 
samples? State all necessary assumptions. 


7.12 THE TYPE II ERROR AND THE POWER OF A TEST 

In our discussion of hypothesis testing so far, we have said a lot about a, the 
probability of committing a Type I error (rejecting a true null hypothesis). We 
have said little about /3, the probability of committing a Type II error (failing to 
reject a false null hypothesis). This is because, for a given test, a is a single 
number that the investigator assigns. /3, on the other hand, may assume one of 
many values. Consider the null hypothesis that some population parameter is equal 
to some specified value. If H 0 is false and we fail to reject it, we commit a Type 
II error. The value of /3, the probability that we will commit a Type II error, 
depends on the true value of the parameter of interest, the hypothesized value of 
the parameter, a, and n, given that the hypothesized value is not the true value. 
Thus, before we perform a hypothesis test, we may compute many /3’s by pos¬ 
tulating many values for the parameter of interest, for fixed a and /z, given that 
the hypothesized value is false. 

An important fact about a particular hypothesis test has to do with how well 
the test controls Type II errors. For a given test in which H 0 is false, we would 
like to know with what probability we will reject it. The power of a test, 1 — /3, 
gives information relevant to this point. It gives the probability that we will reject 
a false null hypothesis. We see then that 1 — ft, which can be computed for any 
alternative value of a parameter, represents the probability that we will take the 
correct action when H 0 is false because the true parameter is equal to the one for 
which we computed 1 — /3. We may, for a given test, construct a power function 
that gives possible values of the parameter of interest along with the corresponding 
values of 1 - j8. The graph of a power function, called a power curve, is a useful 
device for quickly assessing the nature of the power of a given test. 


EXAMPLE 7.12.1 To show the procedures we use to analyze the power of a test, 
let us refer again to Example 7.3.1. In this example, n — 10, a ~ 3.6, and 
a = 0.05. The hypotheses were 

H 0 : \x = 17.5, H x \ \x # 17.5 

When we investigate the power of a test, it is convenient to locate the acceptance 
and rejection regions on the x scale rather than the z scale. We find the critical 
values of x for a two-sided test using the following formulas: 


*u Mo "h z 


and 


X L ~ Mo 


z 


Vw 


where x u and x L are the upper and lower critical values, respectively, of x; +z 
and — z are the critical vales of z; and fx 0 is the hypothesized value of fx. For the 
present example, we have 

Xrj = 17.50 + 1.96 — = 17.50 + 1.96(0.36) = 17.50 + 0.7056 = 18.21 
u (10) 


and 


x L = 17.50 - 1.96(0.36) = 17.50 - 0.7056 = 16.79 

Suppose that H 0 is false, that is, that /x is not equal to 17.5. In that case (x is 
equal to some value other than 17.5. We do not know the actual value of /x. 
However, if H 0 is false, it is one of the many values that are greater than or 
smaller than 17.5. Suppose that the true population mean is fx x = 16.5. Then the 
sampling distribution of x { is also approximately normal with /x^ = /x = 16.5. 
We may call this sampling distribution f(x } ) and the sampling distribution under 
the null hypothesis f(x 0 ). 

Now /3, the probability of the Type II error of failing to reject a false null 
hypothesis, is the area under the curve of f(Xi) that overlaps the acceptance region 
specified under H 0 . To determine the value of /3, we need to find the area under 
fQcJ, above the x axis, and between x = 16.79 and x = 18.21. The value of 13 
is equal to P(16.79 ^ x < 18.21) when /x = 16.5. This is the same as 

/16.79 - 16.5 18.21 - 16.5\ /0.29 1.7l\ 

p\ - < z < - = /> - < z < - 

V 0.36 0.36 ) \0.36 0.36/ 

= P(0.81 < z < 4.75) 

* 1 - 0.7910 = 0.2090 


This means that the probability of taking an appropriate action (that is, rejecting 
H 0 ) when the null hypothesis states that fx — 17.5, when in fact fx — 16.5, is 
1 - 0.2090 = 0.7910. As we noted, jx may be one of a large number of possible 
values when H 0 is false. Figure 7.12.1 shows several such possibilities graphi¬ 
cally. Table 7.12.1 shows the corresponding values of (3 and 1 - /3 (which are 
approximate), along with the values of f3 for some additional alternatives. 
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FIGURE 7.12.1 
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Note that in Figure 7.12.1 and Table 7.12.1 those values of fji under the alter¬ 
native hypothesis that are closer to the value of /jl specified by H 0 have larger 
associated p values. For example, when jjl = 18 under the alternative hypothesis, 
P = 0.7190. And when fi = 19.0 under H u p = 0.0143. The power of the test 
for these two alternatives, then, is 1 — 0.7190 = 0.2810 and 1 — 0.0143 = 
0.9857, respectively. We may show the power of the test graphically in a power 
curve, as in Figure 7.12.2. Note that the higher the curve, the greater the power. 

Thus, although only one value of a is associated with a given hypothesis test, 
there are many values of p, one for each possible value of ju. if p, 0 is not the true 
value of fi as hypothesized. Also, unless alternative values of /n are much larger 
or smaller than /x 0 , p is relatively large compared with a. In general, we use 
hypothesis-testing procedures more often in those cases in which, when H 0 is 
false, the true value of the parameter is fairly close to the hypothesized value. In 



TABLE 7.12.1 
Values of (3 and 
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of ijl u Example 
7.12.1 
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1 - P 

16.5 

0.2090 

0.7910 

16.0 

0.0143 

0.9857 

18.0 

0.7190 

0.2810 

18.5 

0.2090 

0.7910 

19.0 

0.0143 

0.9857 

17.0 

0.7190 
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most cases, $, the computed probability of “accepting” a false null hypothesis, 
is larger than a, the probability of rejecting a true null hypothesis. These condi¬ 
tions are compatible with our earlier statement that a decision based on a rejected 
null hypothesis is more conclusive than a decision based on an “accepted” null 
hypothesis. The probability of being wrong in the latter case is generally larger 
than the probability of being wrong in the former case. 

Figure 7.12.2 shows the V-shaped appearance of a power curve for a two-sided 
test. In general, a two-sided test that discriminates well between the value of the 
parameter in H 0 and values in H { results in a narrow V-shaped power curve. A 
wide-spread V-shaped curve indicates that the test discriminates poorly over a 
relatively wide interval of alternative values of the parameter. 

The power curve for a one-sided test with the rejection region in the upper tail 
appears as an elongated S. When the rejection region of a one-sided test is located 
in the lower tail of the distribution, the power curve takes the form of a reverse 
elongated S. The following example shows the nature of the power curve for a 
one-sided test. 

EXAMPLE 7.12.2 The mean time assembly-line employees now take to do a certain 
task on a machine is 65 seconds, with a standard deviation of 15 seconds. The 




times are approximately normally distributed. The manufacturers of a new ma¬ 
chine claim that their product will reduce the mean time taken to perform the task. 
The quality-control engineers design a test to determine whether or not they should 
believe the claim of the makers of the new machine. They choose a significance 
level of a = 0.01 and randomly select 20 employees to perform the task on the 
new machine. The hypotheses are 

// 0 : fjc > 65, H{. /x < 65 

The quality-control engineers also wish to construct a power curve for the test. 
They compute, for example, the following value of 1 — j8 for the alternative 
ix = 55. The critical value of x for the test is 

® - 2 - 33 fe) • 57 
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Exercises 


We find p as follows: 

p = P(x> 5l\fx = 55) 


J 57 - 55\ 
V > 15/V20/ 


= 1 - 0.7257 = 0.2743 


P(z > 0.60) 


Consequently 1 — /3 = 1 — 0.2743 = 0.7257. Figure 7.12.3 shows the cal¬ 
culation of p. Similar calculations for other alternative values of (x also yield 
values of 1 - p. When plotted against the values of fx , these give the power 
curve shown in Figure 7.12.4. 


An alternative way of evaluating a test is to look at its operating characteristic 
curve, or OC curve. When we construct an OC curve, we plot values of p , rather 
than 1 - p, along the vertical axis. In other words, an OC curve is the complement 
of the corresponding power curve. 

Construct and graph the power function for each of the following situations. 

7.12.1 H 0 : p < 516, H x \ fx > 516, n = 16, cr = 32, a = 0.05. 

7.12.2 H 0 : (x = 3, H x : /x # 3, n = 100, a = 1 , a = 0.05. 

7.12.3 Hq\ p < 4.25, H x : p > 4.25, n = 81, cr = 1.8, a = 0.01. 




FIGURE 7.12,4 
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7.13 DETERMINING SAMPLE SIZE TO CONTROL BOTH TYPE I AND 
TYPE II ERRORS 

In Chapter 6 we learned how to find the sample size needed to construct an interval 
estimate for either a population mean or a population proportion with a specified 
confidence coefficient. In Section 7.3 we also learned that we can use confidence 
intervals to test hypotheses. Since a confidence coefficient is equal to 1 — a, the 
method of determining sample size that we learned in Chapter 6 takes into account 
the probability of a Type 1 error, but not a Type II error. 

In many applications of statistical inference, we want to consider Type II errors 
as well as Type I errors in determining sample sizes. To illustrate the procedure, 
let us refer again to Example 7.12.2. 

EXAMPLE 7.13.1 In Example 7.12.2, the hypotheses are 

H 0 : jn > 65, //,: ji < 65 

The population standard deviation is 15, and the probability of a Type I error is 
set at 0.01. Now suppose that we want the probability of failing to reject H 0 (/3) 
to be 0.05 if H 0 is false because the true mean is 55 rather than the hypothesized 
65. We wish to know how large a sample we need in order to realize, simulta¬ 
neously, the desired levels of a and (3. For a — 0.01 and n = 20, is equal to 
0.2743. The critical value is 57. Under the new conditions the critical value is 
unknown. Let us call this new critical value C. We also let ju 0 be the hypothesized 
mean and fi { the mean under the alternative hypothesis. We can transform each 
of the relevant sampling distributions of I, the one with a mean of jjl 0 and the one 




with a mean of fi u to a z distribution. Consequently we can convert C to a z 
value on the horizontal scale of each of the two standard normal distributions. 
When we transform the sampling distribution of x that has a mean of fi 0 to the 
standard normal distribution, we call the z that results z 0 . When we transform the 
sampling distribution of 3c that has a mean of to the standard normal distribution, 
we call the z that results z,. Figure 7.13.1 represents the situation described so 
far. 

We can express the critical value C as a function of z 0 and fx 0 and also as a 
function of z, and This gives the following equations: 



(7.13.1) 




(7.13.2) 


We can set the right-hand sides of these equations equal to each other and solve 
for n , to obtain 



(|z,)| + Iz^q- 
(Mo - Mi) _ 


(7.13.3) 


To find n for our illustrative example, we substitute into Equation 7.13.3. We 
have /jl 0 = 65, jx ] = 55, and cr = 15. From Table C, the value of z that has 
0.01 of the area to its left is -2.33. The value of z that has 0.05 of the area to 
its right is 1.645. Both z 0 and are taken as positive. We determine whether C 
lies above or below either /x 0 or jx ] when we substitute into Equations 7.13.1 and 
7.13.2. Thus we compute 


n 


(2.33 + 1.645)15" 
(65 - 55) 


= 35.55 
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Exercises 


Summary 


We need a sample of size 36 to achieve the desired levels of a and (3 when we 
choose fi] = 55 as the alternative value of fx. 

We now compute C, the critical value for the test, and state an appropriate 
decision rule. To find C, we may substitute known numerical values into either 
Equation 7.13.1 or 7.13.2. For illustrative purposes, we solve both equations for 
C. First we have 


C = 65 - 2.33 



59.175 


From Equation 7.13.2, we have 


C - 55 + 1.645 



= 59.1125 


The discrepancy between the two results is due to rounding error. 

The decision rule, when we use the first value of C, is as follows. 

Select a sample of size 36 and compute x. If x < 59.175, reject H 0 . If x > 59.175, 
do not reject H 0 . 


For the sake of brevity, we have limited our discussion of the Type II error 
and the power of a test to the case involving a population mean. These concepts 
may be extended to cases involving other parameters. 

For a discussion of Type II errors and OC curves for tests involving other 
parameters and for tests involving the mean when the population standard devia¬ 
tion is unknown, see the book by Bowker and Lieberman (1972). 

7.13.1 Refer to Exercise 7.12.1. Let /3 = 0.10 and /x, = 520, and find n and C. State 
the appropriate decision rule. 

7.13.2 Refer to Exercise 7.3.2. Let (3 - 0.05 and fx { = 4.52, and find n and C. State 
the appropriate decision rule. 

7.13.3 Refer to Exercise 7.12.3. Let (3 = 0.03 and /x, = 5.00, and find n and C. State 
the appropriate decision rule. 

This chapter covered the basic concepts of hypothesis testing, the second type of 
statistical inference procedure. You learned that the hypothesis-testing procedure 
may be broken down into seven sequential steps. 

1. Statement of the hypotheses 

2. Identification of the test statistic and its distribution 

3. Specification of the significance level 

4. Statement of the decision rule 

5. Collection of the data and performance of the calculations 

6. Making the statistical decision 

7. Making the administrative decision 





Review Questions 


You learned how to carry out the seven-step hypothesis-testing procedure when 
the parameter of interest is one of the following: 

1. The mean of a normally distributed population for which the population vari¬ 
ance is known 

2. The mean of a normally distributed population for which the population vari¬ 
ance is unknown 

3. The mean of a population that is not normally distributed (large-sample case) 

4. The difference between the means of two normally distributed populations 

5. The difference between the means of two populations that are not normally 
distributed (large-sample case) 

6. A population proportion (large-sample case) 

7. The difference between two population proportions (large-sample case) 

8. The variance of a normally distributed population 

9. The ratio of the variances of two normally distributed populations 

You learned that in many hypothesis-testing procedures, we may compute the 
test statistic using the general formula 


Test statistic = 


sample statistic - value of hypothesized parameter 
standard error of the statistic 


You also learned how to determine a p value for each test conducted. 

Finally, for the case in which the normal distribution is the appropriate sampling 
distribution, you learned how to compute the power of a statistical test for specified 
alternative values of the population mean and how to determine the sample size 
needed to control both Type I and Type II errors. 

For an examination of the subject of hypothesis testing at a more rigorous level, 
see the book by Lehmann (1959). 

Where appropriate, carry out the seven-step hypothesis-testing procedure at the indicated 
level of significance and compute the p value for the test. 

1. What is the purpose of hypothesis testing? 

2. What is a hypothesis? 

3. List and explain each step in the seven-step hypothesis-testing procedure. 

4. What is a Type I error? 

5. What is a Type II error? 

6. Explain how to decide what statement goes into the null hypothesis and what statement 
goes into the alternative hypothesis. 

7. What are the assumptions underlying the use of the t statistic in testing hypotheses 
about a single mean? the difference between two means? 

8. When may the z statistic be used in testing hypotheses about: (a) a single population 
mean? (b) the difference between two population means? (c) a single population propor¬ 
tion? (d) the difference between two population proportions? 

9. In testing a hypothesis about the difference between two population means, what is 
the rationale behind pooling the sample variances? 


10. What is meant by the power of a test? 

11. Give an example from your field of interest in which it would be appropriate to test 
a hypothesis about the difference between two population means. Use real or realistic data 
and carry out the seven-step hypothesis-testing procedure. 

12. Do Exercise 11 for a single population mean. 

13. Do Exercise 11 for a single population proportion. 

14. Do Exercise 11 for the difference between two population proportions. 

15. Do Exercise 11 for a population variance. 

16. Do Exercise 11 for the ratio of two population variances. 

17. A manufacturer of strapping tape claims that the tape has a mean breaking strength 
of 500 psi. Experience has shown that breaking strengths arc approximately normally 
distributed with a standard deviation of 48 psi. A random sample of 16 specimens is drawn 
from a large shipment of tape, and a mean of 480 psi is computed. Can we conclude from 
these data that the mean breaking strength for this shipment is less than that claimed by 
the manufacturer? Let a = 0.05. 

18. The mean length of time required to perform a certain task on an assembly line has 
been established at 15.5 minutes, with a standard deviation of 3 minutes. A random sample 
of 9 employees is taught a new method. After the training period, the average time these 
9 employees take to perform the task is 13.5 minutes. Do these results provide sufficient 
evidence to indicate that the new method is faster than the old? Let a = 0.05. Assume 
that the times required to perform the task are normally distributed. 

19. A certain type of yarn is manufactured under specifications that the mean tensile 
strength must be 20 lb. A random sample of 16 specimens yields a mean tensile strength 
of 18 lb and a standard deviation of 3.2 lb. Can we conclude from these data that the true 
mean tensile strength is less than 20 lb? Assume that the tensile strengths are approximately 
normally distributed. Let a = 0.05. 

20. A manufacturer of electrical products will not accept a shipment of a certain part from 
a vendor if there is reason to believe that the mean resistance is not 70 ohms. A random 
sample of 25 selected from a large shipment yields a mean and standard deviation, re¬ 
spectively, of 66 and 10 ohms. Should the shipment be accepted? Let a = 0.05. Assume 
that the resistances are approximately normally distributed. 

21. The credit manager of a department store chain believes that the average age of charge- 
account customers is less than 30 years. A random sample of 100 charge-account customers 
reveals a mean age of 27 years and a standard deviation of 10 years. Do these data provide 
sufficient evidence to support the credit manager’s belief? Let a = 0.05. 

22. A random sample of size 81 gives a mean and standard deviation, respectively, of 
485 and 45. (a) Test the null hypothesis that /i = 500. Let a = 0.01. (b) Test the null 
hypothesis that (jl ^ 500 (a = 0.01). 

23. We test two brands of electric fuse by subjecting each to a fixed load and measuring 
the subsequent life of the fuse in seconds. We find the following test results. Can we 
conclude from these data that Brand B fuses have longer life, on the average, than Brand 
A fuses? Let a = 0.05. What assumptions are necessary in order to carry out a valid 
hypothesis test? 


Brand A 
Brand B 


n = 7 
n = 10 


x - 75 
x = 85 


s 2 = 20 
s 2 = 16 



24. The following results are based on independent simple random samples drawn from 
two normally distributed populations with variances cr] = 135 and a\ = 91. Can we 
conclude from these data that (i 2 < (jl ,? Let a = 0.05. 

Sample 1 n - 15 x = 62 
Sample 2 n = 13 x = 50 

25. Explain the conditions under which a paired comparison test is appropriate. 

26. The following data are obtained from independent simple random samples from two 
populations. Can wc conclude from these data that the population means are different? Let 
a = 0.01. 

Population A n = 50 x = 100 s 2 = 650 
Population B n - 50 x = 107 s 2 = 600 



27. An official of a large paint factory believes that more than one-third of ordered raw 
materials are not delivered on time. She compares actual delivery date with promised 
delivery date on a random sample of 100 orders, and finds that 38 orders were not delivered 
on time. Do these data support her belief? Let a = 0.01. 

28. A simple random sample of size 210 yields a p of 0.7. Test the null hypothesis that 
p — 0.75. Let a = 0.01. 

29. It is hypothesized that the proportion of executives reared in cities of 100,000 popu¬ 
lation or less is greater for Industry A than for Industry B. Do the following sample results 
support this hypothesis at the 0.01 level? 


Industry A Industry B 

Sample size 150 100 

Number of executives reared in 

cities of 100,000 population or less 78 48 


H 

E9 


30. An opinion poll firm has two mailing lists available for the distribution of a question¬ 
naire. A simple random sample of 200 names and addresses is selected from each list, 
and a questionnaire covering general-interest topics is mailed to each person. Of the ques¬ 
tionnaires sent to the sample from Mailing List A, 52% are returned, whereas only 40% 
of those sent to the B sample are returned. Do these data provide sufficient evidence to 
indicate that people on Mailing List A are more apt to respond to this type of questionnaire? 
Let a = 0.05. 

31. A manufacturing plant uses flexible hose extensively. The critical characteristic of the 
hose is its ability to withstand high temperatures. A new brand of hose is being considered 
to replace the brand now being used. When 25 specimens of the old brand and 25 of the 
new brand are tested, the mean temperature at failure for the old brand is lower than for 
the new brand. The variance for the old brand, however, is 1790, and the variance for the 
new brand is 3625. Do these data provide sufficient evidence to indicate a greater variability 
for the new brand? Let a = 0.01. 

32. According to specifications, the variance of the shear strength of a particular spot 
weld must be 324 lb 2 or less. A random sample of 11 welds tested for shear strength gives 
a variance of 400 lb 2 . On the basis of these data, should we conclude that the specifications 
arc not being met? Let a - 0.05. What assumption must you make to validate your 
method? 



33. A random sample of 64 bank depositors reveals a mean checking account balance of 
$375 with a standard deviation of $80. Can we conclude from these data that the population 
mean is less than $400? Let a = 0.01. 

34. A sample of 9 high school seniors in a school system reports a mean of 5 hours worked 
at part-time jobs during a recent week. The sample standard deviation was 3 hours. Do 
these data provide sufficient evidence to indicate that the mean for the population is less 
than 8 hours? Assume a normally distributed population. 

35. Draw a simple random sample of 30 one-digit numbers from Table D of the Appendix. 
Test the null hypothesis that /jl = 4.5 at the 0.05 level. Let cr 2 = 8,25. Compare your 
results with those of the other members of your class. Repeat the exercise, but this time 
use the sample variance in computing the standard error. Compare the results from the 
two procedures. 

36. The owner of a shopping center claims that more than 50% of the households within 
a 3-mile radius of the shopping center have at least one member who shops at the center 
at least once a week. In a sample of 300 households in the area, an investigator found that 
members of 171 households did so. Do these data provide sufficient evidence to support 
the shopping center owner’s claim? Let a = 0.01. 

37. Noting the declining popularity of public billiards parlors, a recreation specialist hy¬ 
pothesizes that more than 10% of the homes in a certain area have pool tables. In a random 
sample of 100 homes in the area, 18 are found to have pool tables. Do these data support 
the recreation specialist’s contention? Let a = 0.05. 

38. Draw a simple random sample of 20 one-digit numbers from Table D. Compute p = 
number of odd digits/20. Test the null hypothesis that p = 0.5. Let a = 0.05. Compare 
your results with those of the other members of your class. 

39. A researcher with a commercial nursery conducts an experiment to compare the char¬ 
acteristics of two kinds of tomato plants. The heights at a certain age arc determined for 
a sample of each of the two kinds grown under conditions as near identical as possible. 
The results are as follows. Do these data provide sufficient evidence to indicate that Type 
B plants are, on the average, taller than Type A plants at this age? Let a = 0.05. For 
Type A, n — 15, x = 11.5, and 5 = 3.2. For Type B, n — 22, x - 13.2, and s = 3.8. 

40. A psychologist investigates the differences between high-performing and low perform¬ 
ing salespersons with respect to certain psychological factors. A random sample is selected 
from each of the two groups, and sampled subjects are given a battery of tests. The results 
of one such test, designed to measure subjects’ need for security, are as follows. Can we 
conclude from these data that the two populations differ with respect to mean level of need 
for security? Let a — 0.05. For the 16 high performers, x = 4.75 and s 2 = 2.25. For 
the 21 low performers, x = 3.25 and s 2 = 2.00. 

41. Draw two simple random samples of one-digit numbers from Table D, letting 
n { = n 2 = 30 and cr] = cr 2 = 8.25. Test the null hypothesis that pL ] — /jl 2 — 0 at the 
0.05 level of significance. Compare your results with those of other members of your 
class. Repeat the exercise, but this time use sample variances in computing the standard 
error. Compare the results from the two procedures. 

42. A factory manager wishes to know whether the efficiency of employees working in a 
high-noise area could be improved by reducing the noise level. The following table gives 
efficiency ratings taken before and after noise-reduction measures were introduced for 15 
affected employees. Can we conclude from these data that reducing the noise level raises 
the efficiency level of employees? A higher number indicates a higher efficiency. Let 
a = 0.01. 



Efficiency rating 


Efficiency rating 


Employee 

Before 

After 

Employee 


Before 

After 

1 

21 

32 

9 


22 

40 

2 

35 

35 

10 


35 

48 

3 

40 

58 

11 


28 

38 

4 

38 

57 

12 


20 

33 

5 

23 

37 

13 


39 

39 

6 

27 

40 

14 


28 

41 

7 

28 

39 

15 


34 

44 

8 

39 

58 





43. A sample of 12 pairs of brothers, 

no more than two years apart in age, take part in a 

study conducted by a 

high school guidance counselor. During his senior year in high 

school, each boy is given a business-knowledge test. The younger brothers, when they are 

seniors, all take 

a course in business practices that was 

not taken by the older brothers 

when they were 

seniors. The following table shows the scores. Should we conclude from 

these data that the course in business practices raises the level of a student’s knowledge 

of business? Let 

a = 

0.01. 





Older 


Younger 


Older 


Younger 

Pair brother 

brother 

Pair 

brother 


brother 

1 104 


113 

7 

150 


151 

2 223 


214 

8 

143 


146 

3 241 


246 

9 

205 


210 

4 103 


104 

10 

185 


191 

5 145 


150 

11 

104 


111 

6 156 


160 

12 

225 


234 


44. A random sample of 500 is selected from the subscribers to a sports magazine. The 
sample is then divided at random into two subsamples, A and B, of 250 each. Each subject 
is mailed a questionnaire seeking his or her opinions of certain sports teams. Each subject 
in Subsample A is sent a dollar bill with the questionnaire. Subjects in Subsample B are 
sent the questionnaire only. Then 212 persons in Subsamplc A and 150 in Subsample B 
return a completed questionnaire. Do these data provide sufficient evidence to indicate that 
paying people causes an increase in the rate of response to mailed questionnaires? Let 
a = 0.05. 

45. An ad agency testing its commercials inserts a Format A test commercial for a deter¬ 
gent into the normal Monday morning programming of a local radio station. The next day 
the agency telephones 100 listeners. Asked whether they recall the commercial, 25 out of 
the 100 say they do recall it. The following Monday morning the ad agency inserts a 
Format B commercial for the same detergent into the radio station’s normal programming. 
The next day the agency follows it up with telephone calls to 110 listeners. Of these, 40 
were able to recall the Format B commercial. Do these data provide sufficient evidence 
to indicate that Format B is more easily recalled than Format A? 

46. A random sample of households is selected from each of two communities, A and B. 
Each head of household is asked the question, “Is anyone in this household bothered by 
air pollution?” In Community A, 80 out of 240 answer yes. In Community B, 90 out of 
250 answer yes. Do these data provide sufficient evidence to indicate a difference in 
population proportions between the two communities? 

47. An advertising executive believes that the proportion of adult females in Area A who 
regularly watch a certain soap opera on television exceeds by more than 0.10 the proportion 
in Area B who regularly watch it. Independent random samples of adult females from the 
two areas give the following information. Do these data provide sufficient evidence to 



support the advertising executive’s belief? Let a - 0.05. In Area A, the sample size 
n A = 150, and the number of respondents who regularly watch the program is 98. In Area 
B, the sample size n B = 200, and the number who watch regularly is 80. 

48. An industrial psychologist with a large company believes that a certain employee 
orientation program will reduce the turnover rate among new employees by more than 
15%. During a certain year, by random assignment, 100 new employees are chosen to 
participate in the orientation program. Another 100 new employees, by random assignment, 
are chosen as a control group. Both groups are followed for a period of five years. At the 
end of this time, 22 persons in the experimental group and 45 in the control group have 
left the firm. Is the psychologist’s belief about the orientation program justified? Let a -- 
0.05. 

49. A chemist with a pest-control company believes that the variance of the life of a 
termite exposed to a poison is 625 min 2 . A random sample of 11 termites yields a variance 
of 1225. Do these data provide sufficient evidence to indicate that the chemist’s assessment 
of the variability is wrong? Let a = 0.05. 

50. A time-and-motion expert believes that the variance of the time clerical employees 
need for a certain task is 9 min 2 . A random sample of 6 employees who perform the task 
yields a sample variance of 25. Do these data provide sufficient evidence to indicate that 
the variance is greater than the time-and-motion expert believes? Let a = 0.05. 

51. Two groups of executives arc given a test to measure their levels of extroversion. 
Group I consists of 25 executives who started their careers as salespersons. Group II 
consists of 31 executives who started their careers as accountants. The variances computed 
from the sample data are s 2 = 81 and = 36. Do these data suggest, at the 0.05 level 
of significance, that the population of scores represented by group I is more variable than 
that represented by group II? Let a = 0.05. State all necessary assumptions. 

52. A random sample of 16 college freshmen who plan to major in marketing and a 
random sample of 13 who plan to major in accounting are given a sales aptitude test. The 
variance of the scores of the marketing majors is 7.29. The variance of the scores of the 
accounting majors is 39.69. Do these data provide sufficient evidence to indicate, at the 
0.01 level of significance, that the two population variances are different? State all nec¬ 
essary assumptions. 

53. A drug manufacturer wants to know whether two methods of producing headache 
tablets result in a difference in mean thickness. A researcher draws a random sample from 
the items produced by the two methods, and records the following results (coded for 
computational convenience). 

Method A 39 46 35 38 36 45 42 54 52 55 

Method B 50 41 44 47 51 43 57 40 51 43 44 51 60 59 40 


Do these data provide sufficient evidence to indicate that the two population means are 
different? Let a — 0.05. 


54. Simple random samples were selected from among male factory workers in two in¬ 
dustries. The variable of interest was a measure of lung health. The results were as follows. 


Industry A 

3.44 

3.81 

2.05 

3.01 

2.42 

2.12 

2.83 

3.26 

3.69 

2.46 

2.72 

3.39 


2.64 

3.65 

3.64 

3.65 









Industry B 

3.94 

2.96 

4.14 

2.55 

3.52 

2.92 

2.92 

3.33 

2.62 

3.76 

3.94 

4.19 


2.62 

4.31 

2.55 

3.51 

4.31 

4.15 

3.51 

3.14 

3.89 

-i-L-- 





Can we conclude from these data that the two population means differ? Let a •- 0.10. 





55. A firm makes soap using two different formulas. The firm wants to know the specific 
gravity of the soap produced by the two formulas. To compare the two formulas with 
respect to specific gravity of the product, a chemist draws simple random samples from 
production lots representing the two formulas. The results (coded for computational con¬ 
venience) were as follows. 


Formula A46485 3 27 64 

Formula B 75866 10 10 9 10 699884 





a 


Can we conclude from these data that the two population means are different? Let a = 
0.05. 

56. Select a simple random of size 50 from the population of employed heads of house¬ 
holds in Appendix II. Perform a hypothesis test to see whether you can conclude that the 
proportion of single persons in the population is greater than 0.20. Let a = 0.05. 

57. In a study comparing the attitudes of white-collar and blue-collar workers toward paid 
religious holidays, researchers with a large firm selected a random sample of 150 white- 
collar workers and an independent random sample of 120 blue-collar workers. Of these, 
29 of the white-collar and 34 of the blue-collar workers said that they thought paid religious 
holidays were very important. Do these data provide sufficient evidence to indicate, at the 
0.05 level, that the proportions of workers who think paid religious holidays are important 
are different in the two sampled populations? 

58. Consider the population of employed heads of households in Appendix II. Select a 
simple random sample of size 30 from this population. Perform an appropriate hypothesis 
test to see whether you can conclude that the mean age of the subjects in the population 
is greater than 30. Let a = 0.05. Compare your results with those of your classmates. 

59. An industrial psychologist believes that the mean test score for manual dexterity of a 
population of employees with a certain handicap is greater than 75. The population of 
scores, which has a standard deviation of 9, is assumed to be normally distributed. A 
random sample of 20 of these employees yielded the following data: 77, 99, 96, 89, 85, 
63, 51, 52, 54, 81,91, 69, 91,92, 98, 70, 53, 76, 90, 64. Do these data provide sufficient 
evidence to support the psychologist’s belief? Let a = 0.01, 

60. A drug manufacturer is concerned with the side effects of a depressant drug when 
used by normal adults. A researcher would like to find a dosage that would produce 
sedation, but not be strong enough to cause serious side effects. A random sample of 16 
subjects taking part in an experiment with the drug achieved sedation with the following 
dosages, in milligrams per kilogram of body weight: 1.6, 1.8, 7.3, 5.7, 3.0, 1.6, 3.8, 
3.1, 7.8, 7.4, 4.2, 1.6, 2.1, 2.1, 5.5, 4.4. Do these data provide sufficient evidence to 
indicate that the mean dosage required to produce sedation is greater than 2.5 milligrams 
per kilogram of body weight? Let a = 0.05. 

61. Two researchers wish to know if they can conclude that high school seniors with a 
high aptitude for a career in law have higher IQs than seniors with a low aptitude for a 
law career. The subjects of their study consist of 12 pairs of seniors. Each pair was matched 
on as many relevant variables as possible. The subjects within each pair differed with 
respect to their aptitude for a law career. The following table shows the IQs of the sample 
subjects. Do these data provide sufficient evidence to indicate that seniors with a high 
aptitude for law have, on the average, higher IQs than those who have a low aptitude for 
it? Let a = 0.05. 


AT/ 


Pair 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

High aptitude 

129 

103 

123 

118 

99 

95 

126 

115 

110 

122 

127 

135 

Low aptitude 

127 

94 

115 

114 

90 

92 

129 

105 

101 

110 

125 

134 


62. The amount of a certain chemical in the raw material used to produce linoleum is a 
critical factor in the linoleum’s durability. A researcher for a linoleum manufacturer be¬ 
lieves that the mean concentration of the chemical is different in the raw material obtained 
from two suppliers. To find out whether or not this belief can be supported by objective 
data, the researcher takes random samples from the raw material provided by the two 
suppliers, and determines the concentration of the chemical in each specimen. The results 
are as follows. 


Supplier A 

60.9 

49.8 

65.3 

40.6 

51.6 

69.7 

58.0 

59.6 

47.8 

46.9 

47.6 

67.3 

Supplier B 

72.9 

67.3 

81.4 

89.4 

86.5 

51.1 

72.9 

74.0 

77.8 

86.4 

82.0 

77.6 


74.8 

50.7 

61.0 

57.4 

61.0 

57.8 

74.7 

89.4 

-l!_ 





Do the data support the researcher’s belief? Let a = 0.05. State any assumptions that are 
necessary. 

63. Researchers give each of a random sample of 15 employees with high absenteeism 
records (Group A) a test to measure level of hostility. They give the same test to an 
independent random sample of 22 employees with low absenteeism records (Group B). 
The results are as follows. 


Group A 

62 

93 

71 

90 

69 

90 

71 

76 

86 

71 

81 

84 

65 

61 

69 

Group B 

55 

56 

57 

60 

48 

60 

53 

65 

64 

46 

41 

67 

66 

64 

42 59 70 


75 

69 

72 

74 

55 












Do these data provide sufficient evidence to indicate that, on the average, employees who 
are often absent are more hostile than employees who are not? A high score indicates a 
high level of hostility. Let a — 0.01. What use can the researchers make of their findings? 
What assumptions are necessary? 

64. A market research firm wants to find out whether the annual average household con¬ 
sumption of diet mayonnaise differs in two large market areas. The firm selects random 
samples of 100 households in each area, with the following results. 

Area 1 x^ = 10 units s 1 = 6 

Area 2 x 2 = 14 units s 2 = 8 


What should the firm conclude from these results? Let a = 0.01. What use can the 
researchers make of these results? 

65. A sample of 100 orders received during a year by a mail-order house specializing in 
hobby and craft supplies showed the following receipts, rounded to the nearest dollar. 


8 

12 

9 

14 

8 

10 

7 

17 

18 

18 

20 

10 

23 

12 

27 

11 

27 

15 

15 

16 

21 

14 

22 

14 

29 

28 

26 

16 

19 

24 

21 

14 

23 

21 

28 

27 

29 

18 

19 

32 

23 

13 

24 

22 

28 

27 

32 

16 

19 

36 

24 

10 

21 

21 

27 

29 

31 

16 

19 

36 

22 

13 

22 

22 

25 

26 

32 

15 

15 

37 

23 

14 

21 

24 

26 

34 

34 

15 

16 

34 

23 

14 

20 

24 

27 

37 

33 

19 

19 

33 

9 

11 

9 

21 

6 

38 

9 

18 

18 

32 


Can we conclude on the basis of these data that the mean value of the company’s receipts 
is greater than $20? Let a = 0.05. 
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Popular Record Marketing 


Two researchers in marketing, Meenaghan and Turnbull,* reviewed the theory 
of product life cycle with respect to a specific product. They conducted a re¬ 
search project to determine the applicability of the theory to popular records. 

As part of the study, they collected extensive information on a sample of 12 
records judged (on the basis of certain well-defined criteria) to be successes 
and a sample of 10 judged to be failures. They collected data on each record 
for a period of 16 weeks from its date of release. One item of information they 
collected on each was a measure of radio airplay. Measurement of this variable 
yielded the following means and standard deviations for the two samples of 
records. 


Successes x = 179,595 s = 54,231 

Failures x = 15,268 s = 21,722 


Can one conclude, on the basis of these data, that successful and unsuccessful 
records differ with respect to mean amount of airplay? Let a = 0.01. Find the 
p value for the test. What assumptions are required? 

*A. Meenaghan and Peter W. Turnbull, "The Application of Product Life Cycle Theory to Popular Record 
Marketing,” European Journal of Marketing, 15, 5 (1981), 1-50. 


Raising Finance and Corporate Reporting Policies 


The main purpose of a corporate annual report is to give information about 
the company's affairs to outside persons. Michael Firth* conducted a study that 
examined the changes a company makes in the quality and extent of voluntary 
financial disclosure in its annual report when it is trying to find additional 
capita! in the stock market. 

Firth paired each money-raising company with a control (non-money-raising) 
firm of the same size, in the same industry. He hypothesized that the change 
in the extent of financial disclosure in corporate reports of companies raising 
money in the stock market is greater than that of control companies not con¬ 
cerned with doing so. He calculated a disclosure-index score for each firm in 
the matched samples by assigning points for each of 48 items disclosed in the 
firm's annual report. 

He examined three paired samples of British firms: (1) firms that made new 
issues (those that raised equity capital on the stock market for the first time), 


Michael Firth, "Raising Finance and Firms' Corporate Reporting Policies,” Abacus, 16 (December 1980), 
100-115. 




(2) small firms that made rights issues (that is, that sought additional capital 
on the stock market), and (3) large firms that made rights issues. Firth compared 
the annual reports of these firms at three different times: (a) three years prior 
to issuing stock (t - 3), (b) one year prior to issuing stock (t - 1), and (c) 
immediately after issuing stock ( t ). 

Suppose that you wish to repeat the study with a sample of 12 firms making 
new stock issues (and their matched control firms). Let us say that you use 
Firth's index, with the following results. (A higher score indicates greater dis¬ 
closure.) For each of the time periods, state appropriate hypotheses, analyze 
the data, and state your conclusions. Let a = 0.05 for all tests. What assump¬ 
tions do you need to make? 




Disclosure Index Score at Time 



t - 

3 

t - 

1 


t 

Equity- 


Equity- 


Equity- 


raising firm 

Control firm 

raising firm 

Control firm 

raising firm 

Control firm 

15.38 

15.00 

20.35 

15.10 

21.22 

15.12 

5,21 

5.50 

10.00 

5.55 

11.54 

5.70 

17.78 

17.95 

21.98 

18.00 

23.90 

18.10 

6.31 

6.30 

11.30 

6.30 

12.41 

7.35 

6.21 

6.22 

11.32 

6.22 

12.11 

7.10 

16.77 

17.00 

21.78 

17.20 

23.02 

18.25 

16.05 

16.00 

21.15 

16.00 

23.10 

17.00 

8.24 

8.35 

13.27 

8.42 

14.24 

8.45 

7,52 

7.00 

12.00 

7.10 

13.10 

8.60 

17.67 

18.00 

23.67 

18.10 

24.21 

19.20 

9.24 

10.00 

14.50 

10.25 

16.00 

10.50 

12.47 

12.45 

17.30 

12.00 

18.30 

12.10 




8. Analysis of Variance 


Chapter Objectives: Now that you have learned the 
basic concepts and techniques of statistical inference, 
you can use these ideas and skills in more complex situa¬ 
tions. In this chapter you learn to test the null hypothe¬ 
sis that several population means are equal. To do this, 
you will use a technique known as analysis of variance. 

After studying this chapter and working the exercises, 
you should be able to: 

1. Describe the following experimental designs and use 
the appropriate analysis-of-variance technique to ana¬ 
lyze the data generated by these designs: (a) the 
completely randomized design, (b) the randomized 
complete block design, (c) the Latin square design, (d) 
the factorial experiment 

2. Test for a significant difference between individual 
pairs of sample means 



8.1 INTRODUCTION 


We may view the preceding chapters, which covered the basic concepts and 
techniques of descriptive and inferential statistics, as providing the foundation for 
this and later chapters. The objective of this portion of the book is to help you 
understand some of the more widely used tools of statistical analysis. 

This chapter is concerned with analysis of variance, which is defined as follows. 

Analysis of variance is a technique whereby the total variation present in a set 
of data is partitioned into several components. Associated with each of these 
components is a specific source of variation, so that in the analysis, it is possible 
to ascertain the magnitude of the contribution of each of these sources to the 
total variation. 

The introduction and development of the techniques of analysis of variance are 
due to R. A. Fisher, whose contributions over the years 1912 to 1962 had a 
tremendous influence on modem statistical thought. See, for example, Fisher 
(1950, 1973, 1966). 

Analysis of variance is most often used to analyze data derived from designed 
experiments. Its use, however, is not restricted to this type of analysis. As some 
of the examples and exercises of this chapter show, we can also use analysis-of- 
variance techniques to analyze data from surveys. 

The principles involved in designing experiments are covered in a number of 
texts, including those by Chew (1958), Cochran and Cox (1968), Cox (1958), 
Davies (1978), Federer (1955), Finney (1955, 1976), Fisher (1971), Hicks (1973), 
John (1971), Kempthorne (1952), Kirk (1968), Lee (1975), Li (1964), Lindman 
(1974), Mendenhall (1968), Montgomery (1976), Neter and Wasserman (1974), 
Peng (1967), Scheffe (1959), and Winer (1971). This chapter does not treat the 
subject thoroughly. However, we shall touch on many of the important concepts 
of experimental design. 

When we design experiments with an analysis in mind—before we conduct the 
experiment—we identify those sources of variation that we consider important. 
We then choose a design that will let us measure the extent to which these sources 
contribute to the total variation. 

We use analysis of variance to estimate and test hypotheses about both popu¬ 
lation variances and population means. Although this text deals with testing hy¬ 
potheses about population means, the conclusions depend on the magnitudes of 
the observed variances. 

The valid use of analysis of variance depends on a set of fundamental assump¬ 
tions. We will state these briefly in the sections that follow. Eisenhart (1947) gives 
a thorough discussion of these assumptions. Not all the assumptions will be met 
perfectly in a given situation. Thus it is important to be aware of the underlying 
assumptions and to be able to recognize serious departures from them. The con¬ 
sequences of the failure to meet these underlying assumptions are spelled out by 
Cochran (1947), who suggests that analysis-of-variance results be considered as 
approximate rather than exact, because experiments in which all the assumptions 
are perfectly met are so rare. 



We shall discuss analysis of variance in the context of three different experi¬ 
mental designs: the completely randomized, the randomized complete block, and 
the Latin square. We shall present the concept of a factorial experiment through 
its use in a completely randomized design. The experimental design texts cited 
earlier present additional designs. 

We shall use the following six-step format to present each analysis of variance: 

1. Model . The model consists of a symbolic representation of a typical value from 
the data under analysis. 

2. Assumptions. Each model has a specific set of assumptions that will be listed. 

3. Hypotheses . We shall state the null and alternative hypotheses that will be 
tested under the model. 

4. Calculations. We shall explain the necessary arithmetic calculations. 

5. ANOVA table. We summarize the arithmetic calculations in the analysis-of- 
variance (ANOVA) table, which facilitates the assessment of the results of the 
analysis. 

6. Decision. We make a statistical decision as to whether the null hypothesis 
should be rejected or not rejected. Any administrative decision will be influenced 
by the statistical decision. 

In order to facilitate this discussion, let us now define two terms. We shall 
define other terms as we introduce them. 

The term treatment is broadly used in the design of experiments. It can refer 
to any factor that the experimenter controls. It may refer, for example, to a type 
of drug, one of several concentrations of a single drug, a new type of house paint, 
an advertising technique, or a particular training program. The term has its origins 
in the early days of analysis of variance, when different groups had different 
treatments (in the usual sense of the word) applied to their respective experimental 
units. 

We call an entity that receives a treatment the experimental unit. The experi¬ 
mental unit may be, for example, an individual, a single white mouse, a group 
of white mice, a plot of ground, a segment of the consuming public, a group of 
trainees, or an item of production. We may also think of it as that entity on which 
we take a measurement in order to obtain a value for the variable of interest. 


8.2 THE COMPLETELY RANDOMIZED DESIGN 

When we use the completely randomized design , we assign the treatments at 
random to the experimental units. Suppose, for example, that we want to road- 
test four brands of tires, A, B, C, and D, to determine whether there are any 
differences among the brands with respect to expected tire mileage. We can assign 
10 tires of each brand at random to the 40 rear wheels of 20 cars. We can then 
drive the cars until a predetermined amount of tread wear occurs. At that time we 
record the number of miles driven. We then use an analysis of variance to decide 



whether the brands differ with respect to expected tire mileage. The two sources 
of variation that we isolate are variation due to treatment (brand) differences and 
residual variation, which measures the variation resulting from all sources other 
than the tire brands. 

We analyze data from an experiment using the completely randomized design 
by what is known as the one-way analysis of variance. This is so called because 
we classify the experimental units (and consequently the measurements obtained) 
according to only one criterion—the treatment group to which they belong. 

We may also use the one-way analysis of variance to analyze data from a sample 
survey in which we draw a random sample from each of several populations. In 
fact, the one-way analysis of variance, which enables one to test for a significant 
difference among several means, is an extension of the t test for the difference 
between two means (discussed in Chapter 7). 

Here is an example of the type of business problem for which one-way analysis 
of variance would be appropriate. 

EXAMPLE 8.2.1 A plastics manufacturer wants to know what effect three formula 
ingredients have on elasticity of the product. Each of the ingredients is randomly 
assigned to batches of experimental material. Table 8.2.1 shows the results of 
elasticity tests (in coded form) made on each specimen of the product. The manu¬ 
facturer wishes to know whether the ingredients have a differentia] effect on the 
elasticity of the plastic. 

We now give a general detailed discussion of the six-step one-way analysis-of- 
variance procedure, then apply it step by step to Example 8.2.1. [Here we shall 
use double summation notation. For a review of this topic, see Appendix III.] 

1. Model. The model is a symbolic representation of a typical value from a set 
of data. We may identify such a value by the symbol x^, where the subscript ij 
indicates the /th value from the jih group. Within a given group (population), a 
specific value is equal to the mean /jlj of the group plus some amount that prevents 
the individual value from being equal to the mean (unless the amount is 0, in 
which case the individual value is equal to the mean). 

In other words, if we add some amount (which may be negative, positive, or 
0) to the mean, we get the particular value x^. We may call this amount the error 
and designate it by e#. We can write this relationship as 

Xij = flj + e,j (8.2.1) 


TABLE 8.2.1 

Elasticity of plastic 

A 

5 

6 

5 

8 

6 

7 

6 

5 6 7 

Total 

61 

Mean 

6.1 

produced with 

B 

8 

9 

8 

7 

9 

9 

10 

8 

68 

8.5 

three different 

C 

10 

10 

9 

8 

8 

9 

10 

9 8 9 10 8 

108 

9,0 

formula ingredi¬ 
ents (coded data) 










237 = Grand 
total 

7.9 = Grand 

mean 




Solving for e i} , we have 


( 8 . 2 . 2 ) 


e ij = Xij - h 

Suppose that there are k finite populations of equal size. We can obtain the 
grand mean p of all the observations in the k populations together by calculating 
the mean of the k population means: 

k 

M = 2 Hj/k (8.2.3) 

— 1 

We may designate the difference between any /x ; and \x as 

Tj = ~ V < 8 - 2 - 4 ) 

We usually refer to this term as the jth group effect, or jth treatment effect. It is 
a measure of the average effect that the yth treatment has on an individual obser¬ 
vation. 

We may solve Equation 8.2.4 for pj to obtain 

jXj = (X -I- Tj (8.2.5) 

Substituting the right-hand portion of Equation 8.2.5 into 8.2.1, we have 

= IX + Tj + ( 8 . 2 . 6 ) 

and the model is specified. 

Thus, a typical observation from the set of data under study is composed of the 
grand mean /x, a treatment effect r 7 , and an error term that represents the deviation 
of the observation from its group mean. 

2. Assumptions. The assumptions depend on the manner in which we select the 
treatments. We may identify two cases. We usually refer to them as the fixed- 
effects model , or model I, and the random-effects model, or model II. We use the 
fixed-effects model when we are interested in the k populations represented by 
the sample data. And we use the random-effects model when we consider these 
k populations to be a sample of size k from a population of treatments. 

In the fixed-effects model, our inferences are limited to the specific treatments 
that appear in the experiment. The following are some examples of situations in 
which the fixed-effects model applies. 

(a) We have 3 methods of teaching management skills to department supervisors. 
Trainees are randomly assigned to one of the methods. At the end of the training 
period, we compare the mean scores of the 3 groups and make inferences about 
the 3 methods’ relative effectiveness. 

(b) We have 5 kinds of fertilizer. Each is used to fertilize 10 randomly selected 
tomato plants. We measure the mean yield of each group of 10 plants and make 
inferences about the relative quality of the 5 fertilizers. 

(c) We have 4 factories. We select a random sample of employees from each. 
Then we determine the mean amount of time these employees spend per day 



watching television. We wish to make inferences about the equality of the 4 
population means. 

When we randomly assign treatments to experimental units, we may use the 
sample results to make inferences about causation. For example, in situation (b), 
the 5 fertilizers are randomly assigned to the tomato plants. Thus we may, as we 
said earlier, make inferences about their relative quality. Situation (c), however, 
is different. In that case factories are not randomly assigned to employees. There¬ 
fore our inference is limited to a statement about the difference among the pop¬ 
ulation means. We cannot infer that the factories are the cause of any observed 
differences among means. 

In the random-effects model, the populations represented in the experiment are 
a sample of populations from a larger set of populations. The following are ex¬ 
amples of situations in which the random-effects model is applicable. 

(a) We have 50 kinds of fertilizer. We select a random sample of 5 for an ex¬ 
periment. We wish to make an inference about the entire set of 50 fertilizers based 
on the performance of the 5 in our experiment. 

(b) We have 200 factories. We select a random sample of 10 factories. We wish 
to use the results of a survey in these 10 factories to make inferences about the 
set of 200 factories. 

(c) We have 50 drugs that are potential competitors for the treatment of a certain 
disease. We randomly select 6 for use in an experiment. Our objective is to draw 
conclusions about the set of 50 drugs. 

The calculations are identical, regardless of the model. However, we make a 
distinction in the interpretation of the results when the parameters of interest are 
means. We shall assume the fixed-effects model in the examples and exercises of 
this chapter. For a more complete discussion of the two models, see the papers 
by Eisenhart (1947), and Wilk and Kempthorne (1955). 

The assumptions for the fixed-effects model are as follows: 

(a) The k sets of observed data constitute k independent random samples from 
the specified populations. 

(b) Each of the populations represented by a sample is normally distributed, with 
mean jij and variance crj. 

(c) Each of the populations has the same variance. That is, erf = cr\ = • • • = 
<j\ — <t 2 , the common variance. 

(d) The t- s are unknown constants. Since (by Equation 8.2.4) 7) = fij - n, and 
since the sum of all deviations of values of a variable from their mean is equal 
to 0, we may write St- = 0. 

Three consequences of the relationship 

tii = x<j ~ /un¬ 
specified in Equation 8.2.2 are as follows: 

(a) The have a mean of 0. This follows from the fact that the mean of the x if 
is 



(b) The e i} have a variance equal to the variance of the x ij9 since the e tj and x tj 
differ only by a constant. In other words, the variance of the e tj is equal to cr 2 , 
the common variance specified in assumption (c) above. 

(c) The eij are normally (and independently) distributed. 

As already noted, the fixed-effects model implies that interest is limited to the 
k populations represented by the sample data. Any inferences that we make apply 
only to these populations. For example, suppose that the treatments represented 
by the sample data are three methods of packaging a product. Any inferences we 
make under model I are limited to these three methods. They are not extended to 
any larger set of methods. 

Snedecor and Cochran (1980) point out that violations of the assumptions of 
equal population variances and normally distributed populations tend to increase 
the probability of rejecting a true null hypothesis. Violating the assumption of 
equal population variances usually causes a worse problem than violating the 
assumption that the populations are all normally distributed. The consequences of 
unequal variances are less severe when the sample sizes are equal. 

3. Hypotheses. Under the present model, we may test the null hypothesis that all 
treatment, or group, means are equal against the alternative that there is at least 
one inequality among them. In general, we may state the hypotheses symbolically 
as follows: 

H 0 : fJti = i u , 2 = • • • = fx k , H l : not all ptj are equal 

If the population means are equal, each treatment effect is equal to 0. Alterna¬ 
tively, we may state the hypotheses as 

H 0 : Tj — 0, y=l,2,..., k, Hp not all Tj = 0 

4. Calculations. To facilitate the calculations, we may display experimental or 
survey data that are to be analyzed by one-way analysis of variance as in Table 
8.2.2. We define the symbols used in Table 8.2.2 as follows: 

Xjj = the z'th observation that receives the 7 th treatment, 
i = 1 , 2 , . . ., Uj, j = 1,2 ,...,£ 
itj = the number of observations in the 7 th group 

n. 

Tj = ^ Xjj = total of the 7 th column 

' i=i ' 

Tj 

x ■; = — = mean of the j th column 
n j 

k k n. 

T = = 2 2-*?/ = tota l °f a U observations 

7=1 ' 7=1 /=1 

- _ ZL 

n 

k 

±nj 
7=1 


We also let n = 



TABLE 8.2.2 

Sample data for 

i 

2 

Population sampled 

3 

k 

analysis by one¬ 


*1 2 

*13 * * ' 

*i k 

way analysis of 

*21 

*22 

*23 

*2 k 

variance 

*31 

*32 

*33 

x 3k 



*^i 

x n 2 2 

*n 3 3 

x n k k 


Total 

Ti 

T. 2 

T.3 

T. k 

T 

Mean 

*i 

*.2 

*.3 

x .k 

X 


Figure 8.2.1 shows the sample observations x tJ , represented by dots; the sample 
means x Jy represented by squares; and the grand mean x , represented by the 
heavy horizontal line, for Example 8.2.1. This figure visually represents the var¬ 
iability of the three sample observations about their respective means and the 
variability of the sample means about the grand mean. In Figure 8.2.1, for ex¬ 
ample, e 4 B shows the amount by which the fourth observation in sample B de¬ 
viates from the mean of sample B. 


FIGURE 8.2.1 
The data of 
Example 8.2.1, 
showing the 
variability within 
samples and the 
variability among 
sample means 





The Total Sum of Squares We define analysis of variance as an arithmetic process 
by which we partition the total variation in a set of data into components attrib¬ 
utable to different sources. Variation used in this context refers to a sum of squared 
deviations of values from their mean, or sum of squares. The total sum of squares 
that we may compute from a set of data is the sum of the squares of the deviations 
of each observation from the mean of all the observations taken together. We 
define this total sum of squares as 

* "/ 

SST = X ~ * ) 2 ( 8 . 2 . 7 ) 

j=1 '=1 

where , tells us to add the squared deviations for each group and 2-L { instructs 
us to sum the k group totals obtained by applying ,. 

We may rewrite Equation 8.2.7 in a form that is more convenient for computing 
purposes: 


k n . 

= 2 & 

. 7=1 /=1 




We can show by appropriate algebraic manipulation that 


SST = 2 «/(■*/ - x ) 2 + 2 2 (-% " x jf 

7=1 . 7=1 /=! 


The Among-groups Sum of Squares The first term on the right of Equation 8.2.9 
tells us to find the difference between each group mean and the grand mean, 
square each of these differences, multiply by the group size, and find the sum of 
these products. This quantity is a measure of the variation among the group means. 
It is referred to as the among-groups sum of squares, or SSA. The computing 
formula for this quantity is 

k 

SSA = 2 n f*i - X ) 2 

7=1 






T 2 

1+ ... +u 


2 2 


which, when all groups are equal in size, reduces to 


VA- T 2 
v = 1 1 7 


k n. 

2 ^ * 


The Error Sum of Squares The second term on the right of Equation 8.2.9 is the 
pooled sum of squares computed from the values in each group. This procedure 



extends to several groups the pooling procedure for two groups that we described 
in Chapter 7. This component of variation is called the error sum of squares, or 
SSE. It is sometimes called the within sum of squares or the residual sum of 
squares. Although we can compute the error sum of squares directly, it’s more 
convenient to subtract the among-groups sum of squares from the total sum of 
squares. That is, 

SSE = SST - SSA (8.2.12) 

Note that Equation 8.2.8 and Equation 8.2.10 both contain the factor 



n 


This term, called the correction term C, may be computed only once and used as 
needed. 

[Li (1964) presents an interesting graphical representation of the various sums 
of squares computed in analysis of variance. 1 

5. ANOVA table. When the true group means are all equal, we can show that 
SSA and SSE, when divided by their respective degrees of freedom, yield inde¬ 
pendent and unbiased estimates of cr 2 , the population variance that is assumed to 
be common to all groups. In the analysis of variance, we refer to a sum of squares 
divided by the appropriate degrees of freedom as a mean square. Thus the among- 
groups sum of squares divided by the associated degrees of freedom is called the 
among-groups mean square (MSA). And the error sum of squares divided by the 
error degrees of freedom is called the error mean square (MSE). 

Suppose that the null hypothesis is true, that is, there are no treatment effects. 
Then the two estimates of cr 2 ought to be fairly close in size, since they are 
independent estimates of the same parameter. Suppose, however, that the null 
hypothesis is false. Then the among-groups mean square, which reflects variability 
among treatment or group means, ought to be larger than the error mean square, 
which is an unbiased estimate of the common population variance even when H 0 
is not true. 

Chapter 7 showed that we can compare two variances by forming their ratio. 
At present we are interested in the following variance ratio: 

p - amon g~g rou P s mean square ( 8213 ) 

within-groups mean square 

If the numerator and denominator of Equation 8.2.13 are about equal, the variance 
ratio is close to 1. Then the hypothesis of equal group means is supported. If, 
however, the among-groups mean square is much larger than the within-groups 
mean square, the variance ratio is much greater than 1. In this case the hypothesis 
of equal group means becomes suspect. 

Even when the null hypothesis is true, it is unlikely that the two estimates of 
cr 2 are equal, because of the uncertainty (variation) of sampling. We must decide, 
then, how much greater than 1 the computed variance ratio has to be before we 





can conclude that something other than sampling fluctuation is operating. In other 
words, we wish to know how large F must be before we are willing to conclude 
that the observed difference between the two estimates of a 2 is not due to chance 
alone. 

Chapter 7 showed that the ratio of two sample variances, such as the quantity 
defined by Equation 8.2.13, follows a distribution known as the F distribution 
when the sample variances are computed from samples that have been randomly 
and independently drawn from normal populations. We need, then, to determine 
the appropriate F distribution by observing the degrees of freedom associated with 
the numerator and denominator of F. Once we have done this, the size of the 
observed F that will cause rejection of the hypothesis of equal population variances 
depends on the critical value selected. This, in turn, depends on the significance 
level. In other words, if the computed value of F is equal to or exceeds the critical 
value of F, we reject the null hypothesis of equal population variances. In this 
context of analysis of variance, rejecting this hypothesis is equivalent to rejecting 
the null hypothesis of equal population means. 

We find the number of degrees of freedom associated with the various com¬ 
ponents of variation in the one-way analysis of variance as follows: 

Total degrees of freedom = n — 1 

Among-groups degrees of freedom = k — 1 

Error degrees of freedom — n — 1 — (k — 1) = n — k 

Table 8.2.3, which is an analysis-of-variance (ANOVA) table, summarizes the 
results of this section. 

6. Decision. Suppose that the computed value of F in the last column of Table 
8.2.3 is equal to or greater than the critical value of F. Then we reject the null 
hypothesis of equal means. As with all hypothesis tests, this critical value depends 
on the significance level chosen. We treat the hypothesis test in analysis of vari¬ 
ance as a one-sided test, even though the hypotheses as stated are like those for 
a two-sided test. As we have noted, if H 0 is false, we expect the numerator of F 
to be larger than the denominator. Thus it seems logical that the rejection region 
should be in the right tail of the F distribution. This line of reasoning suggests 
that the p value should be one-sided. 

Let us now apply the six-step one-way analysis-of-variance procedure to Ex¬ 
ample 8.2.1. 


TABLE 8.2.3 
ANOVA table for 
one-way analysis 
of variance 


Source of 
variation 

Sum of 
squares 
(SS) 

Degrees of 
freedom 
(df) 

Mean square 
(MS) 


F 

Among groups 

SSA 

k - 1 

MSA = SSA/(* - 

1) 

F = MSA/MSE 

Error 

SSE 

n — k 

MSE = SSE/(r? - 

k) 


Total 

SST 

n — 1 





FIGURE 8.2.2 
Picture of the 
populations 
represented by 
Example 8.2.1 
when H 0 is true 
and the 

assumptions are 

met 



1. Model Xjj = fj, + Tj -f 

2. Assumptions. Assume that the three ingredients used in the experiment are the 
only ones of interest. Thus the assumptions of the fixed-effects model apply. 

3. Hypotheses 


H 0 : fi A = p B = p c , H x \ at least one equality does not hold 
Let a = 0.05. 

Figure 8.2.2 shows the case in which the assumptions are met and H 0 is true. 
Figure 8.2.3 shows the case in which the assumptions are met, but H 0 is false 
because none of the population means are equal. If any one of the three population 
means is not equal to the other two, the null hypothesis is also false. 

4. Calculations . By Equation 8.2.8, 

9372 

SST = 5 2 + 6 2 + * • * + 8 2 - — = 72.7 


and by Equation 8.2.11, 


SSA 


61 2 68 2 108 2 
10 + 8 + 12 " 


237 2 

30 


49.8 


FIGURE 8.2.3 
Picture of the 
populations 
represented in 
Example 8.2.1 
when the 
assumptions of 
equal variances 
and normally 
distributed 
populations are 
met, but H 0 is false 
because none of 
the population 
means are equal 


By subtraction, we find that 

SSE = 72.7 - 49.8 = 22.9 

5. ANOVA table. Table 8.2.4 shows the results. 





TABLE 8.2.4 
ANOVA table for 
Example 8.2.1 


One-Way 
ANOVA and the t 
Distribution 


TABLE 8.2.5 
Survival time in 
minutes of 22 wild 
rats fed a fixed 
amount of one of 
two poisons 


Source 

ss 

df 

MS 

F 

Among groups 

49.8 

2 

24.9 

29.358 

Error 

22.9 

27 

0.84815 


Total 

72.7 

29 




6. Decision . The critical value of F for a = 0.05 and 2 and 27 degrees of freedom 
as given by Appendix Table G is 3.35. Since 29.358 is greater than 3.35, we 
reject H 0 . The manufacturer may conclude that the ingredients do have a differ¬ 
ential effect on the elasticity of the product. Since 29.358 is greater than 6.49, 
p < 0.005. 

The completely randomized experimental design extends to three or more treat¬ 
ments the two-independent-samples design for detecting a difference between two 
population means (discussed in Chapter 7). The one-way analysis of variance for 
three or more samples replaces the t test used to detect a significant difference 
between two sample means. Here is an example that shows the relationship be¬ 
tween analysis of variance and the t test. 


EXAMPLE 8.2.2 Researchers with a pesticide manufacturer wish to compare the 
effectiveness of two formulas for rat poison. Wild rats are randomly assigned to 
receive one of the two formulas. The variable of interest is the rats’ survival time, 
in minutes, after eating a fixed amount of the poison. Table 8.2.5 shows the 
results. We want to test the null hypothesis that the mean survival time is the 
same for both poisons. That is, we want to test 

H 0 : /jl a = P'Q against the alternative H x \ p A ^ 

Let a = 0.05. To test the null hypothesis by means of the t test, we first compute 
x A = 43.10 s 2 a = 22.53273 

* B = 38.24 4 = 9.47822 


We pool the sample variances, by Equation 6.6.1, to obtain 
1(22.53273) + 9(9.47822) 


4 = 


We now compute 


t = 


20 


43.10 - 38.24 


16.658201 16.58201 

+ 


= 16.658201 


= 2.78 


12 


10 


Poison A 38.7 42.4 34.8 46.6 48.0 36.4 51.1 42.1 43.3 43.9 43.3 46.6 

Poison B 35.1 36.0 38.7 34.4 40.8 35.4 41.5 37.4 43.5 39.6 


Since 2.78 is greater than 2.0860, the critical t for a = 0.05 and 20 degrees 
of freedom, we reject H 0 . We conclude that the two poisons do result in different 
mean survival times. 

Now let us use one-way analysis of variance to test the null hypothesis of equal 
population means. 


(899.6) 2 

C = 22 = 36,785.462 

SST = (38.7) 2 + (42.4) 2 + • • • + (39.6) 2 - 36,785.462 
= 37,247.46 - 36,785.462 = 461.998 


SSA = + ^ 382 ' 4 > 2 


12 10 
SSE = 461.998 - 128.834 - 333.164 


36,785.462 = 128.834 


From these results, we compute 

MSA - 

MSE = 


F = 


128.834 

1 

333.164 

20 

128.834 

16.6582 


128.834 

16.6582 

7.73 


Since 7.73 is greater than 4.35, the critical value of F for a = 0.05 and 1 and 
20 degrees of freedom, we reject H 0 . We conclude that the two population means 
are not equal. 


We see that the statistical decision we reach using analysis of variance is the 
same as the statistical decision we reach when we use the t test. This isn’t a 
coincidence. For the two-sample case, the two tests are equivalent. Note that the 
critical F value, 4.35, is equal to the critical t value squared. That is, 4.35 = 
(2.0860) 2 . Also, the computed F, 7.73, is equal to the computed t squared. That 
is, 7.73 = (2.78) 2 . In general, for the same a, F with 1 and n — k degrees of 
freedom is equal to the square of t (two-sided test) with n — k degrees of freedom. 
It is this relationship between the t and the F distributions that makes the two tests 
equivalent. 

Often you may feel that one or more of the assumptions underlying one-way 
analysis of variance are not met. In this case, a procedure known as the Kruskal- 
Wallis one-way analysis of variance may provide a suitable alternative test. Chap¬ 
ter 12 discusses this procedure in detail. 


Exercises 



Carry out the six-step analysis-of-variance procedure at the indicated level of significance 
and compute the p value for the test. 

8.2.1 A company testing customer acceptance of a new product uses 4 different counter 
displays, A, B, C, and D. It selects 36 stores, matched on all relevant criteria. Each 






display is used in 9 of the stores. Total sales (coded) at the end of a week are as follows. 
At the 0.05 level of significance, test the null hypothesis of no difference among the four 
means. 

A567786776 

B222332332 

C223322233 

D667888666 

8.2.2 The following data give the production costs, in cents per pound, of broilers pro¬ 
duced in three production areas, A, B, and C, as reported by 10 producers randomly 
selected from each area. Do these data provide sufficient evidence to indicate a difference 
in mean cost among the three regions? Let a = 0.05. 


A 

11 

10 

12 

10 

11 

9 

8 

13 

12 

12 

B 

12 

10 

9 

11 

10 

12 

12 

14 

8 

9 

C 

12 

13 

15 

14 

14 

11 

15 

14 

14 

15 


8.2.3 The following table gives the production cost per dollar of net sales for 24 firms 
with different asset sizes. Do these data provide sufficient evidence to indicate that firms 
of different sizes have different mean costs? Let a = 0.05. (Assets given in millions of 
dollars.) 


$10-19.9 

69 

72 

72 

66 

76 

72 

70 

72 

$20-49.9 

75 

70 

80 

74 

68 

80 

72 

76 

$50 and over 

83 

77 

80 

74 

86 

75 

85 

80 


8.2.4 The following table shows the results, in miles per gallon, of an experiment con¬ 
ducted to compare 3 brands of gasoline. Each brand was used with 7 different cars of the 
same weight and engine size, driven under similar conditions. Do these data provide 
sufficient evidence at the 0.01 level of significance to indicate a difference in brands of 
gasoline? 


Brand A 

14 

19 

19 

16 

15 

17 

20 

Brand B 

20 

21 

18 

20 

19 

19 

18 

Brand C 

20 

26 

23 

24 

23 

25 

23 


8.3 TESTING FOR SIGNIFICANT DIFFERENCES 
BETWEEN INDIVIDUAL PAIRS OF MEANS 

In those cases in which we obtain a significant F and conclude that “not all means 
are equal,” we may want to perform a test of significance on each pair of treatment 
means. There is a problem inherent in this procedure, however, because of the 
probabilities involved. 

Consider, for example, an experiment with five treatments. To find which of 
all the possible pairs of means are significantly different would require 



tests. Suppose that there is no difference among the treatments. That is, suppose 
that the population means are all equal. If we conduct the tests at the 5% level of 


Pairwise 

Comparisons 


Tukey's HSD Test 


significance, the probability of rejecting a true null hypothesis is 0.05 for each 
test individually. As long as we conduct a single test, no problem arises. If we 
carry out all 10 tests, however, the probability of rejecting at least one true 
hypothesis is greater than 0.05. If the tests are independent, the probability of 
rejecting at least one true null hypothesis in 10 tests is 1 - (0.95) 10 = 0.4013, 
which is considerably larger than 0.05. Tests involving all possible pairs of means 
in an analysis-of-variance context are not independent. The probability of rejecting 
at least one true null hypothesis in this case is hard to obtain. However, it seems 
reasonable to expect that as the number of dependent tests increases, the proba¬ 
bility of rejecting at least one true hypothesis also increases. 


Interest is usually focused on pairwise comparisons . A pairwise comparison is 
the difference between two means without regard to the algebraic sign. Investi¬ 
gation of comparisons between pairs of means may take the form of a hypothesis 
test of the difference between two means. Or we may want to construct a confi¬ 
dence interval for the difference. We may encounter either of two situations in¬ 
volving comparisons. 

(a) Before we conduct a study, we may decide that it is worthwhile to compare 
only certain pairs of sample treatment means to see whether they are significantly 
different. We call such comparisons planned or a priori comparisons. We may 
make a priori comparisons whether or not the computed F in the analysis of 
variance is significant. We plan a priori comparisons before or prior to analyzing 
the sample data, as the name implies. 

(b) At other times we may have no basis for planning comparisons among means 
before we conduct the study. If the F value, computed in the analysis of variance, 
is not significant, this indicates that there is no evidence of a treatment effect. 
Thus we will probably not be interested in comparing individual pairs of means. 
However, if the computed F is significant, we are likely to want to find which 
pairs of sample treatment means are significantly different. We call comparisons 
made after the initial analysis of variance a posteriori or post hoc comparisons. 

When the fixed-effects model applies, there are several procedures that we can 
use to make all possible pairwise comparisons among means, whether or not these 
comparisons were planned in advance. Using these procedures, we can make a 
large number of comparisons routinely. Or, after the experiment, we can select 
those comparisons that appear most interesting. 


J. W. Tukey (1953) proposed a procedure for making all pairwise comparisons 
among means. This method, which is now widely used, is called the HSD (hon¬ 
estly significant difference) test or the w procedure. When Tukey’s test is used 
with equal sample sizes, we compute a single value with which we compare all 
differences. This value, called the HSD, is given by the following formula: 


HSD — Qa,k,n - k 



(8.3,1) 




TABLE 8.3.1 
Analysis-of- 
variance table. 
Example 8.3.1 


where q is obtained from Appendix Table H for significance level a , k means in 
the experiment, and n - k error degrees of freedom. Any difference between 
pairs of means that exceeds HSD is declared significant. Note that the HSD statistic 
requires that all sample sizes be equal; that is, n x = n 2 = • • • = ty. Let us use 
an example to illustrate the way this procedure works. 


EXAMPLE 8.3.1 Table 8.3.1 presents data from an experiment designed to compare 
4 methods (A, B, C, and D) used in the production of a cleaning compound. The 
variable of interest is the percent solid content of the compound. There are 6 
observations for each method. The computed F is significant at a = 0.01, the 
significance level selected before the analysis. The logical question now is: Where 
do the significant differences occur? 

Tukey’s HSD test provides the answer. When we apply this method to the data 
of Table 8.3.1, we first display the absolute values of the differences between 
means, as in Table 8.3.2. The sample treatment means arranged in ascending 
order of magnitude provide the row and column labels. The body of the table 
gives the corresponding differences. If we choose a significance level of a = 
0.01, we find <70.01,4.20 Table H to be 5.02. From Table 8.3.1, MSE = 57.8281, 
and we compute 


HSD = 5.02 


57.8281 

6 


15.58 


When we compare the differences between means shown in Table 8.3.2 with 
15.58, we realize that only one pair of means is significantly different, 
\x c - jc D | = 18.83. 

When the samples are not all of the same size, one can’t apply the Tukey HSD 
test given by Equation 8.3.1. Spj0tvoll and Stoline (1973), however, have ex¬ 
tended Tukey’s method to the case in which the sizes of samples are different. 
One can apply their procedure in experiments involving three or more treatments 
and significance levels of 0.05 or less. Their method consists of replacing Uj in 
Equation 8.3.1 with nj, the smallest of the samples whose means you are com¬ 
paring. We call the new quantity HSD*. We then have as the new test criterion: 


HSD* = 





We call “significant” any absolute value of the difference between two sample 
means that exceeds the proper HSD*. 


Sou rce 

SS 

df 

MS 

F 

Among groups 

1045.44 

3 

348.49 

6.026 

Error 

1156.56 

20 

57.8281 



2202.00 

23 



Method 

A 

B 

C 

D 

Mean 

75.67 

78.83 

69.17 

87.50 


TABLE 8.3.2 
Differences 
between ail 
pairs of means. 
Example 8.3.1 



*c 

XA 

*B 

*D 

x c = 69.17 

_ 

6.50 

9.66 

18.33* 

x A - 75.67 


— 

3.16 

11.83 

x B = 78.83 



— 

8.67 

x D = 87.50 




— 


^Significant at the 0.01 level 


To see the use of the HSD* statistic at the a — 0.05 level, refer to Example 
8.2.1. The sample means, as shown in Table 8.2.1, are 6.1, 8.5, and 9.0. These 
means are computed from samples of size 10, 8, and 12, respectively. The absolute 
values of the differences between all possible pairs of means are 

|6.1 - 8.5| = 2.4 

|6.1 - 9.0| = 2.9 

|8.5 - 9.0| - 0.5 


The value of <?o.o 5 , 3 , 27 > obtained by interpolation from Table H, is 3.51. To test 
H 0 : al a = /^b* w e have 


HSD* 


3.51 


/0.84815 


1.14 


Since 2.4 is greater than 1.14, we reject H 0 . To test H 0 : /x A = /jl c , we compute 


HSD* = 3.51 


/0.84815 


10 


1.02 


Since 2.9 is greater than 1.02, we reject H 0 . Finally, to test H 0 : fi B = fx c , we 
compute 


HSD* = 3.51 


'0.84815 


8 


1.14 


Since 0.5 is not greater than 1.14, we cannot reject H 0 this time. We conclude, 
then, that fi A is different from /x B and ii c , but that fi B and fx c may be equal. 

[Other multiple-comparison procedures include those proposed by Duncan (1949, 
1951, 1952, 1955), Dunnett (1955, 1964), Fisher (1966), Keuls (1952), Kramer 
(1956), Newman (1939), Rodger (1974), Scheffe (1953, 1959), and Tukey (1949a). 
The various multiple-comparison procedures have been discussed and compared 
by Bancroft (1968), Chen (1960), Daniel and Coogler (1975), Gill (1973), McCall 
(1960a,b,c), and Winer (1971). Daniel (1980) has prepared a bibliography of 
multiple-comparison procedures. ] 

Mead and Pike (1975) and Peterson (1977) point out that multiple-comparison 
procedures are usually not appropriate when the treatments are quantitative rather 
than qualitative. Consider a study of characteristics of department stores. If the 
“treatments” are formed by grouping stores on the basis of size, measured by 
floor space or sales volume, they are quantitative. 


Computers and 
ANOVA 


Exercises 


FIGURE 8.3.1 
Computer output 
for analysis of 
variance of data of 
Example 8.2.1, 
using Minitab 
statistical package 


Computers make it much easier to do the calculations you need for analysis of 
variance. You can work the exercises in this chapter by using one of the computer 
statistical packages that are available. The input requirements and output formats 
for the various statistical packages vary somewhat, but anyone familiar with the 
general concepts of analysis of variance can easily understand them. 

Figure 8.3.1 shows the output for Example 8.2.1 provided by a one-way anal- 
ysis-of-variance program found in the Minitab package written by Ryan, Joiner, 
and Ryan (1976). Compare the ANOVA table on the printout with the one given 
in Table 8.2.4. You can see that the printout uses the label “factor” instead of 
“among groups.” The different treatments or groups are referred to on the printout 
as levels. Thus level 1 = treatment A, level 2 = treatment B, and level 3 = 
treatment C. The printout gives the three group means and standard deviations, 
as well as the pooled standard deviation. Note: This last quantity is equal to the 
square root of the error mean square shown in the ANOVA table. Finally, the 
computer output gives graphic representations of the 95% confidence intervals for 
the mean of each of the three populations represented by the sample data. The 
Minitab package can also provide additional analyses through the use of appro¬ 
priate commands. The package also contains programs for two-way analysis of 
variance. 

8.3.1 Apply Tukey’s test to the data of Exercise 8.2.1. Which counter display should be 
chosen? 

8.3.2 Apply Tukey’s test to the data of Exercise 8.2.2. 

8.3.3 Apply Tukey’s test to the data of Exercise 8.2.4. State the statistical hypotheses to 
be tested. 


ANALYSIS 

OF VARIANCE 



DUE TO 

DF 

SS 

MS = SS/DF 

FACTOR 

O 

49.800 

24.900 

ERROR 

27 ' 

22,900 

0,848 

TOTAL 

29 

72.700 



LEVEL 

N 

MEAN 

ST, DEV 

1 

10 

B. 100 

0,334 

2 

8 

8,500 

0,326 

3 

12 

9 ♦ 0 0 0 

0,853 


F-RATIQ 
23.3S 


POOLED ST. DEV. = 0.321 


INDIVIDUAL 35 PERCENT C. I. FOR LEVEL MEANS 
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8.4 THE RANDOMIZED COMPLETE BLOCK DESIGN 

A paint manufacturer wants to compare the hiding quality of paint produced by 
five different formulas, but wants to eliminate from the experimental error the 
variability that would result if the paint were applied to different surfaces. What 
experimental design is appropriate? The objectives suggest that we use the ran¬ 
domized complete block design. This design is both easily understood and com¬ 
putationally simple. It is appropriate (a) when we can meaningfully cross-classify 
the experimental units according to two criteria, or (b) when we can apply treat¬ 
ments to different experimental material. One of the criteria of classification is 
called treatments, the other, blocks. 

In the randomized complete block design, each treatment must be applied to 
each block. We achieve randomization by randomly assigning the treatments within 
the blocks according to a table of random numbers or any other randomization 
procedure. Cochran and Cox (1968) present special tables that can be used for up 
to 16 treatments. The primary objective of the randomized complete block design 
is to isolate and remove from the error variation the variation attributable to the 
blocks, while at the same time ensuring that treatment means are free of block 
effects. The effectiveness of the design depends on how well you achieve ho¬ 
mogeneous blocks of experimental units. This, in turn, depends on knowing the 
experimental material well. 

In industrial applications, we may use raw material as a blocking factor. In 
comparing different metal-processing machines, for example, we can use each 
machine on several types of metal. Suppose that our objective is to compare the 
performances of different operators. We may want to eliminate differences caused 
by experience or training. We may also use the randomized complete block design 
when we must carry out an experiment in more than one factory (block), or when 
we need several days (blocks) to complete it. In the problem at the beginning of 
this section, the “treatments” are the five paint formulas, and the “blocks” are 
the different types of surfaces to which the paint is applied. 

The analysis-of-variance technique used in the analysis is called the two-way 
analysis of variance, since we classify the data according to two criteria. In 
general, we can use a table such as Table 8.4.1 to display the data generated by 
the randomized complete block design. 


TABLE 8.4.1 
Sample data for 
analysis by two- 
way analysis of 
variance 


Treatments 

Blocks 1 2 3 • • • k Total Mean 




*12 


*32 


Total 

Mean 


1 


T. 2 

* o 


X„k 


Table 8.4.1 introduces the following new notation: 

k = the number of treatments, n = the number of blocks 

Total of the /th block = T f 
Mean of the /th block = T, 

k 

Grand total = T = ^ T , ^ , 

j= 1 ’■ i— 1 

Note: We can find the grand total by adding either row totals or column totals. 

We shall follow the six-step analysis-of-variance procedure introduced in Sec¬ 
tion 8.1 to present the analysis of data from a randomized complete block design. 

1. Model. By an argument similar to that used in Section 8.2, we can establish 
the following model for the randomized complete block design: 

M + ft + Tj + e,j (8.4.1) 

where x (j is a typical value from the overall population 
fi is an unknown constant 

/3, represents a block effect, reflecting the fact that the experimental unit 
fell in the /th block 

Tj represents a treatment effect, reflecting the fact that the experimental 
unit received the yth treatment 

e tj is a residual component representing all sources of variation other than 
treatments and blocks 

2. Assumptions. We make the following assumptions for valid use of the random¬ 
ized complete block design: (a) Each observed constitutes an independent ran¬ 
dom sample of size 1 from one of the kn populations represented, (b) Each of 
these kn populations is normally distributed with mean fx^ and the same variance 
a 2 . This assumption, along with (a), implies that the e tj are independently and 
normally distributed with mean 0 and variance a 2 , (c) The block and treatment 
effects are additive. To state this assumption another way, there is no interaction 
between treatments and blocks. When there is no interaction, the effect of a 
particular block-treatment combination is not greater, nor less, than the sum of 
their individual effects. (We shall discuss interaction in more detail in Section 
8.6.) We can show that when this assumption is met, 

2 t j = 2a = o 

j =1 1 

If the assumption of additivity is not met, the analysis of variance may produce 
misleading results. 

According to Anderson and Bancroft (1952), however, we need not be con¬ 
cerned unless the largest mean is more than 50% greater than the smallest. If we 


= 2 . 


j= i 

. x ji 
k 

n 

= V T 


feel that the additivity assumption is not met, we should use appropriate procedures 
for handling the situation. A test of additivity presented by Tukey (1949b) is often 
used. Mandel (1971) also dealt with the nonadditivity problem. 

When the given assumptions hold, the T y and /3, are a set of fixed constants, 
and the data fit the fixed-effects model. We often consider the blocks to be a 
random sample from some population of blocks. When this is true, and when the 
treatments are fixed, the model is known as the mixed model . When both blocks 
and treatments are random samples from populations of blocks and treatments, 
respectively, we have the random-effects model. 

3. Hypotheses. In general, we test 

H 0 : m.i = M .2 = ^.3 =••• = /** versus 
H j: at least one equality does not hold 

In other words, we test the null hypothesis that the treatment means are all equal 
or, equivalently, that there are no differences in treatment effects. 

Although we can test hypotheses about block means, we are seldom interested 
in such tests when the fixed-effects model applies, since our primary interest is 
in treatment effects. We introduce the blocks merely to eliminate a source of 
extraneous variation. Also, although we randomly assign experimental units to 
treatments, we usually obtain blocks in a nonrandom manner. 

4. Calculations . We can partition the total sum of squares (SST) for the random¬ 
ized complete block design into three components, one each attributable to treat¬ 
ments (SSTr), blocks (SSB), and error (SSE). We may express the partitioned 
sum of squares by the following equation: 


2 2 (Xu - x ) 2 =22 (•*./ - x j 2 +22 - x f 

i /=i j= i i=i ' y=i /=i 

k n 

+ 22 (+</ _ - X j + X ) 2 (8.4.2) 

7-1 i-1 

That is, 

SST = SSTr + SSB + SSE (8.4.3) 


The computing formulas for the quantities in Equations 8.4.2 and 8.4.3 are: 


k n 


SST =22 4 - C 

.7-1 1=1 

(8.4.4) 

SSB = — 1 - C 

k 

(8.4.5) 

-i- \T 2 j 

SSTr = —-- - C 

n 

(8.4.6) 

SSE = SST - SSB - SSTr 

(8.4.7) 



We compute the correction term as follows: 


C = 


( k n 

2 2 - 

7=1 /=1 

kn 


II 

kn 


(8.4.8) 


The degrees of freedom for each component of Equation 8.4.3 are 

Total Treatments Blocks Residual 

kn - 1 = (it - 1) + (n - 1) + (n - 1 )(Jt - 1) 

We find the residual degrees of freedom, like the residual sum of squares, by 
subtraction: 

(kn - 1) - (k - 1) - (n - 1) = kn - 1 - it + 1 - n + 1 

= n(k - 1) - l(it - 1) = (/? - l)(it - 1) 

5. ANOVA table. We can display the results of the calculations for the randomized 
complete block design in an analysis-of-variance table such as Table 8.4.2. 

6. Decision. When the fixed-effects model applies and the null hypothesis of equal 
treatment effects (all r y - = 0) is true, both the error mean square and the treatments 
mean square are estimates of the common variance cr 2 . Consequently, when the 
null hypothesis is true, the ratio 

MSTr 

MSE 

follows the F distribution with k - 1 numerator degrees of freedom and with 
(n - 1) X (k — 1) denominator degrees of freedom. We compare the computed 
ratio with the critical value of F. If this ratio is equal to or exceeds the critical 
value of F , we reject the null hypothesis. 

Here is an example that shows the analysis of variance for a randomized com¬ 
plete block design. 


EXAMPLE 8.4.1 Refer to the case at the beginning of this section. Suppose that the 
paint manufacturer conducts the experiment to see whether the five paint formulas 
differ with respect to hiding quality. In this experiment, the types of paint are 
considered treatments. The blocking factor is the type of surface to which the 
paint is applied. Table 8.4.3 shows the results of the experiment. 

The appropriate analysis-of-variance is as follows. 


TABLE 8.4.2 
ANOVA table for a 
two-way analysis 
of variance 


Source SS df MS F 


Treatments 

SSTr 

k - 1 

Blocks 

SSB 

n - 1 

Error 

SSE 

(n - 1)(Ar - 1} 

Total 

SST 

kn - 1 


MSTr = SSTr/(£ - 1) MSTr/MSE 

MSB = SSB/(a? - 1) 

MSE = SSE/(r? - 1)(/r - 1) 



TABLE 8.4.3 
Measurements of 
hiding quality 
taken on five types 
of paint, each 
applied to five 
different surfaces 


TABLE 8.4.4 
ANOVA table, 
paint experiment 


Surface 


Type of paint 


Total 

T, 

t 2 

t 3 

74 

t 5 

B i 

20 

12 

20 

10 

14 

76 

b 2 

22 

10 

20 

12 

6 

70 

b 3 

24 

14 

18 

18 

10 

84 

b 4 

16 

4 

8 

6 

18 

52 

b 5 

26 

22 

16 

20 

10 

94 

Total 

108 

62 

82 

66 

58 

376 

Means 

21.6 

12.4 

16.4 

13.2 

11.6 

15.04 


1. Model. Since the value of a particular observation is the result of a formula 
(treatment) effect, a surface (block) effect, and the effects of factors not accounted 
for (error), the model specified by Equation 8.4.1 seems appropriate. 

2. Assumptions. Assume that the usual assumptions for the randomized complete 
block design apply. 

3. Hypotheses. Since we are not interested in comparing surface effects, our 
hypotheses relate to the formulas, or treatments. Thus we have 


#o: Aki = Ak 2 = Aks = Ak4 = Ak 5 > 
Hp at least one equality does not hold 


Let a = 0.05. 

4. Calculations. 


SST = 20 2 + 12 2 + • • • + 10 2 - 


376 2 

(5)(5) 


= 6536 - 5655.04 = 880.96 


SSTr - 


108 2 + 62 2 + • • • + 58 2 
5 


- 5655.04 = 335.36 


SSB 


76 2 + 70 2 + • • • + 94 2 
5 


5655.04 


199.36 


SSE = 880.96 - 335.36 - 199.36 = 346.24 
5, ANOVA table. Table 8.4.4 is the ANOVA table for our example. 


Source 

SS 

df 

MS 

F 

Treatments 

335.36 

4 

83.84 

3.87 

Blocks 

199.36 

4 

49.84 


Error 

346.24 

16 

21.64 


Total 

880.96 

24 




6. Decision. The critical value of F for a = 0.05 and 4 and 16 degrees of freedom 
is 3.01. The computed value of F, 3.87, is greater than the critical value of F. 
We therefore reject the null hypothesis that the treatment means are all equal. We 
conclude that the five types of paint do differ with respect to hiding quality. Since 
4.77 > 3.87 > 3.73, we have for this test 0.01 < p < 0.025. 

Suppose that the assumptions underlying the randomized complete block design 
as presented in this section are not met. Then we may use an alternative, known 
as the Friedman two-way analysis of variance, as a hypothesis-testing procedure. 
We discuss this procedure in detail in Chapter 12. 

We can use Tukey’s multiple-comparison procedure with the analysis of data 
from a randomized complete block design. See, for example, Guenther (1964) 
and Kirk (1968). 

Exercises Carry out the six-step analysis-of-variance procedure at the indicated level of significance 

and compute the p value for the test. 

8.4.1 The following data show the charges made by radio stations for a fixed-length spot 
announcement by size of radio station and trade area in which the station is located. After 
eliminating the effect of size, do these data suggest a difference in charges among the five 
trade areas? Let a = 0.05. 


Size (watts) 



Trade area 



1 

II 

III 

IV 

V 

500 

7 

5 

2 

5 

2 

1,000 

16 

11 

8 

6 

5 

5,000 

18 

17 

10 

12 

8 

10,000 

60 

24 

16 

15 

12 



What action(s) would you suggest to advertisers as a result of your statistical analysis? 
Would you consider redoing this analysis after dividing each cost figure by the given radio 
station’s wattage? How would your ANOVA table differ? 

8.4.2 Batches of homogeneous raw material are analyzed in 4 laboratories for the presence 
of lead. Three methods are used. The following table shows the reported amounts of lead 
per unit volume of raw material. After eliminating laboratory effects, do these data suggest 
a difference in detection ability among the 3 methods? Let a — 0.05. 


Laboratory 


Method 


A 


B 


C 




What method of detection would you suggest? Does the cost of the method make a dif¬ 
ference? 

8.4.3 The following table shows the density (coded) of a certain solution by temperature 
and by the technician who prepared the solution. After eliminating the effects of different 
technicians, do these data suggest a difference in mean density at different temperatures? 
Let a = 0.05. 



Technician 


Temperature (in degrees Centigrade) 


10 


20 


30 


40 


A 

B 

C 

D 

E 


7 7 8 

8 8 7 

8 8 7 

9 8 8 

8 7 8 


10 

10 

9 

9 

9 



8.4.4 An engineer compares the tensile strength of a certain material produced by four 
machines at four different temperatures. The effects of the machines are eliminated by 
blocking. The results (coded) are as follows. Do these data provide sufficient evidence at 
the 0.01 level of significance to indicate that temperature has an effect on tensile strength? 


Temperature 


Machine 

A 

B 

C 

D 

i 

12 

26 

24 

23 

II 

15 

29 

23 

25 

III 

15 

27 

25 

24 

IV 

18 

38 

33 

31 


8.5 THE LATIN SQUARE DESIGN 

A researcher with a plastics firm wants to compare the tensile strength of plastic 
sheets made by four different processing methods. The researcher feels that there 
are two main sources of extraneous variation: the technician who mixes the for¬ 
mula and the equipment used in the manufacturing process. What experimental 
design should be used? Since there are two identifiable sources of extraneous 
variation, the researcher needs a design that will isolate and remove both sources 
from the residual. The Latin square design is such a design. 

The term Latin square was first used in an analysis-of-variance context by R. 
A. Fisher (1926), who apparently borrowed the term from the Swiss mathemati¬ 
cian Leonhard Euler (1707-1783). 

In the Latin square design, we assign one source of extraneous variation to the 
columns of the square and the second source of extraneous variation to the rows 
of the square. We then assign the treatments, designated by capital letters, in such 
a way that each treatment occurs once and only once in each row and each column. 
The number of rows, the number of columns, and the number of treatments, 
therefore, are all equal. Table 8.5.1 shows a typical Latin square. 


TABLE 8.5.1 

A typical Latin 




Columns 



Rows 

1 

2 

3 

4 

5 

square 

1 

B 

A 

£ 

C 

D 


2 

D 

C 

A 

B 

E 


3 

C 

B 

D 

E 

A 


4 

A 

E 

C 

D 

B 


5 

E 

D 

B 

A 

C 



The Latin square design has the following advantages: 

1. Eliminating two sources of extraneous variation often leads to a smaller error 
mean square than we would obtain using the randomized complete block design. 

2. The analysis of variance is simple. 

3. There are simple procedures for handling certain complications that may arise 
in the course of the experiment. 

Small Latin squares provide only a small number of degrees of freedom for the 
error mean square. So a minimum size of 5 x 5 is usually recommended. For a 
4x4 square, for example, there are only 6 degrees of freedom associated with 
the error mean square. For a 5 X 5 square, there are 12 error degrees of freedom. 
Since there must be as many rows and columns as treatments, a Latin square 
larger than 12 x 12 is seldom practical. 

We get randomization in the Latin square by randomly selecting a square of 
the desired dimension from all possible squares of that dimension. One method 
of doing this is to randomly assign a different treatment to each cell in each 
column with the restriction that each treatment must appear once, and only once, 
in each row. We may also select squares from those published by Fisher and Yates 
(1957) or Cochran and Cox (1968). 

We display sample data resulting from a Latin square design in a table such as 
Table 8.5.2. 

In this table, the first subscript refers to the row, the second to the column, and 
the third to the treatment. We assign treatments randomly to the cells defined by 
the intersections of particular rows and columns. The treatment appearing in a 
given cell depends on the particular randomization scheme. We use the letter k 
to refer to a particular treatment. A value from the it h row and the yth column 
receiving treatment k is designated as x ijk . The subscripts all run from 1 to r. Since 
the number of rows, columns, and treatments are equal, we write the kt h treatment 
total and kth treatment mean, respectively, as 

T k and x k 

Flere are the six steps in the analysis of variance for the Latin square design. 


TABLE 8.5.2 
Sample data from 
the Latin square 
design 


Rows 



Columns 




Row 

totals 

Row 

means 

1 

2 

3 



r 

1 

*nr 

*12f 

*1 3f 



*1,r 

L 

*i.. 

2 

*21 t 

*22 1 

*23f 



*2/7 

t 2 

*2.. 

3 

*31 f 

*32f 

*33 f 



*3/7 

t y. 

*3.. 

r 

*nt 

Xrlt 

*r3f 



*rrt 

i r 

Xr 

Column 









totals 

T. i. 

T.2. 

T.3. 



T. r 

T 


Column 









means 

*i 

*2. 

*.3. 



X , 


x 


1. Model . We write the model for the Latin square design as 

%ijk /x Oij (3j T Tfc T 

where x iJk = a typical value generated by the experiment 
p — an unknown constant 

a, = a row effect, reflecting the fact that the experimental unit appeared 
in the ith row 

flj — a column effect, reflecting the fact that the experimental unit ap¬ 
peared in the yth column 

r k = a treatment effect, reflecting the fact that the experimental unit re¬ 
ceived the kxh treatment 

e iJk = a residual component representing all sources of variation other than 
rows, columns, and treatments. 

2. Assumptions . (a) Each observation constitutes an independent random sample 
of size 1 from the population defined by the cell in which the observation occurs. 
In general, there are r 2 such populations, (b) Each of the r 2 populations is normally 
distributed, (c) The variances of the r 2 populations are all equal, (d) The row, 
column, and treatment effects are additive. That is, there is no interaction among 
rows and columns; rows and treatments; columns and treatments; or rows, col¬ 
umns, and treatments. 

Tukey (1955) gave a test for nonadditivity in Latin squares. Snedecor and 
Cochran (1980) illustrate its use. 

3. Hypotheses. We want to know whether the results of the experiment provide 
evidence of a true difference in treatment effects. At some significance level a , 
therefore, we test the null hypothesis that all treatment means are equal against 
the alternative that there is a difference between at least one pair of means. We 
state the hypotheses formally as follows. 

Ho- M..1 = M..2 = M .3 = • • • = M,.r> 

H { \ not all treatment means are equal 

4. Calculations . It can be shown that we can partition the total sum of squares 
for the Latin square design into the following components: 

r r r 

2 ( X & - = S ( X . - *...) 2 + 2 Gj . - v ..) 2 

i,j,k= 1 i,j,k= 1 ij.k— 1 

r 

+ 2 (*..* - x j 1 

i,j,k— I 
r 

+ 2 (■*»•,* - - Xj. - x k + 2x ) 2 (8.5.;:) 

i,j,k=\ 

We may also write Equation 8.5.2 as 

SST = SSR + SSC + SSTr + SSE (8.5.3) 





where SSR and SSC are the sums of squares computed from the values in the 
rows and columns, respectively. The computational formulas are as follows: 


r 


SST 

= 2 xfjk - c 

i,j,k= 1 

(8.5.4) 

SSR 

Vr 7-2 

= c 

r 

(8.5.5) 

SSC 

yr. T 2 . 

= —— - C 

r 

(8.5.6) 

SSTr 

'S'r T 2 

= *-* C 

r 

(8.5.7) 

SSE 

- SST - SSR - SSC - SSTr 

(8.5.8) 

The correction factor C is 

given by 



_ &ij,k= 1 x ijk) 2 

t' 7 

yZ. 

(8.5.9) 


r 


5. ANOVA table. Table 8.5.3 is the ANOVA table for the Latin square design. 

6. Decision. When the given assumptions hold, we can test the hypothesis of 
equal treatment means by comparing the computed F = MSTr/MSE with the 
critical value of F for a with r - 1 numerator degrees of freedom and 
r 2 — 3r + 2 denominator degrees of freedom. If the computed F equals or 
exceeds the critical value of F, we reject the null hypothesis. 

The following example illustrates the analysis of variance for the Latin square 
design. 

EXAMPLE 8.5.1 Recall the case at the beginning of this section, in which a re¬ 
searcher wishes to compare the tensile strength of plastic after eliminating the 
effects of technician and equipment. Table 8.5.4 shows the results of the exper¬ 
iment. 

The analysis of variance of these data is as follows: 

1. Model. Since we have eliminated two sources of extraneous variation, the 
model specified in Equation 8.5.1 appears to be appropriate. 


TABLE 8.5.3 
Analysis-of- 
variance table for 
the Latin square 
design 


Source SS df MS F 


Rows 

SSR 

(r- 1) 

Columns 

SSC 

(r~ 1) 

Treatments 

SSTr 

(r- 1) 

Error 

SSE 

r 2 - 3r + 2 

Total 

SST 

r 2 - 1 


MSR = SSR/(r - 1) 

MSC = SSC/(r — 1) 

MSTr = SSTr/(r — 1) MSTr/MSE 

MSE = SSE/(r 2 - 3r + 2) 


TABLE 8.5.4 
Tensile strength 
(coded) of 16 
plastic specimens, 
by technician and 
equipment 


Exercises 


Rows 

(technician) 


Columns (equipment) 




Row 

totals 

Row 

means 

1 

2 3 



4 


1 

>4 = 13 

C = 13 D — 

7 

B 

= 

15 

48 

12.00 

2 

5=12 

A = 13 C = 

11 

D 

= 

9 

45 

11.25 

3 

C = 11 

D = 9 B = 

13 

A 

= 

14 

47 

11.75 

4 

D = 7 

5=13 A = 

12 

C 

= 

11 

43 

10.75 

Column 









totals 

43 

48 

43 



49 

183 


Column 









means 

10.75 

12.00 

10.75 



12.25 


11.44 

Treatment totals: A — 52, B 

= 53, C = 46, D = 

= 32. 






Treatment means: A = 13.00, B = 13.25, C = 

11.50, D 

= 

8.00. 




2. Assumptions. The assumptions for the Latin square design are presumed to 
hold. 

3. Hypotheses. We state the following hypotheses regarding treatments: 


H 0 : p A = p B = p c = /x D , H x : at least one equality does not hold 


Let a — 0.05. 

4. Calculations. From the data of Table 8.5.4, we may compute the following 
sums of squares: 

183 2 

SST = 13 2 + 12 2 + • • • + ll 2 - —- = 2177 - 2093.06 = 83.94 


SSR = 
SSC = 
SSTr = 


48 2 + 45 2 + 47 2 + 43 2 
4 

43 2 + 48 2 + 43 2 + 49 2 
4 

52 2 + 53 2 + 46 2 + 32 2 
4 


- 2093.06 = 3.69 

- 2093.06 = 7.69 

- 2093.06 - 70.19 


SSL = 83.94 - 3.69 - 7.69 - 70.19 = 2.37 


5. ANOVA table. Table 8.5.5 is the analysis-of-variance table for this example. 

6 . Decision. The critical value of F for a = 0.05, 3 numerator degrees of free¬ 
dom, and 6 denominator degrees of freedom is 4.76. Since the computed F of 
58.5 is larger than the critical value, we reject the null hypothesis. The evidence 
from this sample indicates that there is a difference in the treatment means. That 
is, we conclude that the processing methods do have a differential effect. For this 
test p < 0.005, since 58.5 > 12.92. 

We can use Tukey’s multiple-comparison procedure with the analysis of data 
from a Latin square design. [See, for example, Guenther (1964) and Kirk (1968).] 


Carry out the six-step analysis-of-variance procedure at the indicated level of significance 
and compute the p value for the test. 




TABLE 8.5.5 
Analysis-of- 
variance table for 
Example 8.5.1 


Source 

SS 

df 

MS 

F 

Rows (technicians) 

3.69 

3 

1.23 


Columns (equipment) 

7.69 

3 

2.56 


Treatments 

70.19 

3 

23.40 

58.5 

Error 

2.37 

6 

0.40 


Total 

83.94 

15 





8.5.1 The quality of a certain plastic product depends to some extent on the number of 
breaks per 100 lb of material that occur during one of the phases of production. The 
manufacturer tests four methods of treatment designed to reduce breakage: A, B, C, and 
D. The treatments are applied to the raw material before the critical phase of production. 
To eliminate environmental effects such as temperature and humidity, the 24-hour workday 
is divided into four 6-hour periods. Each treatment is used once in each period. Four makes 
of machine are normally used in the process. Thus each treatment is used once on each 
machine. The following table shows the number of breaks per 100 lb of material for each 
treatment, by time period and machine. Do these data suggest a differential treatment 
effect at the 0.05 level of significance? At the 0.01 level? 


Time period 


Machine 

1 

II 

III 

IV 

1 

4(2) 

B( 6) 

C(16) 

0(9) 

2 

5(8) 

0(9) 

4(3) 

C(16) 

3 

C(16) 

4(3) 

D(10) 

B( 7) 

4 

D( 6) 

C( 12) 

5(7) 

4(4) 



8.5.2 An engineer who wants to evaluate 4 brands of lubricating oil uses a Latin square 
design, as shown in the following table. The columns represent the four seasons of the 
year, and the rows represent four makes of car. The variable of interest is the consumption 
of fuel, in gallons per 100 miles traveled. Test at the 0.05 level of significance the null 
hypothesis of no difference between treatment means. 


Seasons 


Vehicle make 

F 

W 

Sp 

Su 

1 

4(12) 

5(10) 

C(10) 

0(12) 

2 

5(10) 

4(12) 

0(12) 

C(10) 

3 

C(10) 

0(11) 

5(10) 

4(12) 

4 

0(11) 

C(10) 

4(13) 

5(10) 



8.5.3 To study the effect of packaging on the sales of a certain cereal, a researcher tries 
three different packaging methods (treatments) at three different times of the week (col¬ 
umns) in three different supermarket chains (rows). The variable of interest is daily sales. 
The following table shows the results of the study. Do these data show a significant 
difference in shoppers’ response to the different packaging methods? Let a = 0.05. 


Time of week 

Store First Middle End 


I 

II 


III 


C(30) 

5(50) 

4(35) 


4(45) 

C(40) 

5(65) 


5(75) 

4(50) 

C(50) 





8.6 THE FACTORIAL EXPERIMENT 


Interaction 


TABLE 8.6.1 
Scores of subjects 
in two education 
groups at three 
methods of 
instruction levels 


The manufacturer of a new product wishes to study the effect on sales of the 
product of different packaging and the effect of availability in different types of 
stores. Three different kinds of packages are to be tested. The three types of store 
are grocery stores, drugstores, and variety stores. The price and quantity per 
package are the same, and other variables are felt to be satisfactorily controlled. 
The experimental format used is that known as th q factorial experiment. This type 
of experiment allows two or more factors to be studied simultaneously. A factor 
is a kind of treatment. Stores and packages are the two factors of interest in the 
present example. Not only can we investigate the effects of the individual factors, 
but when we conduct the experiment properly, we can also study the interaction 
between the two factors. 

We say that: 

There is interaction between two factors if a change in one of the factors 
produces a change in response at one level of the other factor different from 
that produced at other levels of this second factor, where a level is one of the 
treatments within a factor. 

In the present example, there are three levels of each factor. Each type of store 
is a level of the store factor, and each package type is a level of the package 
factor. The two factors are called Factor A and Factor B. Factor A occurs at three 
levels, a x , a 2 , and a 3 . Factor B occurs at three levels, b l9 b 2 , and b 3 . The following 
example illustrates the concept of interaction. 

EXAMPLE 8.6.1 Suppose that we are studying how well factory employees learn 
safety rules. Let’s say that we know the true relationship between three methods 
of instruction and the educational level of the employees. Suppose further that 
workers’ education can be classed at two levels—“less than high school” and 
“high school or higher. ” If we know the true relationship between the two factors, 
we also know, for the three methods of instruction, the mean effect on learning 
of subjects in the two education groups. We measure that effect in terms of scores 
made on a test taken immediately after instruction. Suppose that these means are 
as shown in Table 8.6.1. The following features of the data in Table 8.6.1 are 
important in understanding interaction. 

1. For both levels of Factor A, the difference between the means for any two 
levels of Factor B is the same. That is, for both levels of Factor A, the difference 



-----—_____S- 

Factor B : Method of instruction 


Factor A: Education 

i = 1 / = 2 


/= 3 

Lessthan high school (/ = 1) 

High school or higher (/ = 2) 

fill = 15 // 12 = 30 

/* 2 i = 30 M 22 — 45 


A 13 = 60 
fi 2 3 = 75 



between means for levels 1 and 2 of Factor B is 15, the difference for levels 2 
and 3 is 30, and the difference for levels 1 and 3 is 45. 

2. For all levels of Factor B, the difference between means for the two levels of 
Factor A is the same. In the present case, the difference is 15 at all three levels 
of Factor B. 

3. We see a third characteristic when we plot the data as in Figure 8.6.1. The 
curves corresponding to the different levels of a factor are all parallel. 

When population data have these three characteristics, we say that there is no 
interaction present. 

Interaction between two factors affects the characteristics of data in a variety 
of ways, depending on the nature of the interaction. Table 8.6.2 shows the data 
of Table 8.6.1 altered to show the effect of one type of interaction. 


FIGURE 8.6.1 
Effects of method 
of instruction and 
education, no 
interaction present 




TABLE 8.6.2 
The data of Table 
8.6.1 altered to 
show the effect of 
one type of 
interaction 


FIGURE 8.6.2 
Effects of 
education and 
method of 
instruction, 
interaction present 




Factor B : Method of instruction 


Factor A: Education 


i = 1 /=2 

/= 3 

Less than high school (/ = 
High school or higher (/ = 

1 ) 

2) 

7 *i i = 15 7*12 = 30 

7*21 = 45 7*22 = 30 

7*13 = 60 
7*23 = 15 


The following characteristics of the data in Table 8.6.2 are important in un¬ 
derstanding interaction: 

1. The difference between means for any two levels of Factor B is not the same 
for both levels of Factor A. In Table 8.6.2, for example, the difference between 
levels 1 and 2 of Factor B is - 15 for the less-than-high-school group and 4-15 
for the high-school-or-higher group. 

2. The difference between means for both levels of Factor A is not the same at 
all levels of Factor B. The differences between Factor A means are — 30, 0, and 
+ 45 for levels 1, 2, and 3, respectively, of Factor B. 

3. Figure 8.6.2 shows that the factor-level curves are not parallel. 

When population data have these characteristics, we say that there is interaction 
between the two factors. We emphasize that the kind of interaction illustrated by 
this example is only one of many types of interaction that may occur between two 
factors. 

One advantage of the factorial experiment is that using all the observations to 
study the effects of each of the factors under investigation saves time and effort. 
When we are investigating two factors, we can use a single factorial experiment 
rather than the two different experiments that we would otherwise need—one to 
study each of the two factors. If we conduct two separate experiments, we need 
more experimental units to achieve the level of accuracy of a factorial experiment. 
Thus one two-factor experiment is more efficient than two one-factor experiments. 



Education 


Method of instruction 


Method of instruction 


Education 



Another advantage of the factorial experiment is that, since the various factors 
are combined in one experiment, the results have a wider range of application. 

We can adapt a factorial experiment to any of the designs we have described. 
We illustrate the analysis of a factorial experiment using a two-factor completely 
randomized design. We can present the results of such an experiment in a table 
such as Table 8.6.3. This table shows a levels of factor A, b levels of factor B, 
and n observations for each combination of levels. Each of the ab combinations 
of levels of factor A with levels of factor B is a treatment. In addition to the totals 
and means shown in Table 8.6.3, we may designate the total and mean of the ijth 
cell by 

n 

T ij = X x m and x v = Tjn 

k= 1 

respectively. The subscript / runs from 1 to a and j runs from 1 to b. The total 
number of observations is nab. 

The analysis of variance for the factorial experiment is as follows. 

1. Model. For the sake of brevity, we consider only one type of factorial exper¬ 
iment—the fixed-effects, two-factor, completely randomized design. Under the 
general heading of factorial experiment, we could also consider the random model, 
the mixed model, and the experiment in which more than two factors are involved. 
For discussion of these topics, see the textbooks on experimental design cited 
earlier. 

We may write the model for the present case as 

*ijk = fi + ttf- + fa + (ap)ij + e ijk ( 8 . 6 . 1 ) 


TABLE 8.6.3 
Sample data from 
a two-factor 
completely 
randomized 
experiment 




Factor B 




Factor A 

1 

2 

b 

Total 

Means 

1 


*121 • * • 

*161 

h.. 

*i.. 


*11/7 

*1 2n 

*1 bn 



2 

*21 1 

*221 

*261 

t 2 . 

*2.. 


*21/? 

X 22q 

*26/7 




a 

*a11 

*3 21 

*361 




*a1 n 

*32/7 

x abn 

T a. 

*3.. 

Total 

T. i. 

T.2. 

T. 6 . 

T 


Means 

*.i. 

*2. 

*6. 


X 


where x ijk is a typical observation, jjl is a constant, a represents an effect due to 
factor A, (3 represents an effect due to factor B, (a(3) represents an effect due to 
the interaction of factors A and B, and e ijk represents the experimental error. 

2. Assumptions. The assumptions for the factorial experiment are: (a) The obser¬ 
vations in each of the ab cells constitute an independent random sample of size 
n drawn from the population defined by the particular combination of the levels 
of the two factors, (b) Each of the ab populations is normally distributed, (c) The 
populations all have the same variance. 

3. Hypotheses. We can compute three mean squares, other than the residual mean 
square, from data generated by a factorial experiment of this type. These are the 
mean squares associated with factor A, factor B, and the interaction AB. Con¬ 
sequently we can compute three variance ratios, using each mean square in turn 
in the numerator and the residual mean square in each denominator. There are, 
then, three separate hypotheses that we may test. They are 

(a) H 0 : a, = 0, Hp not all = 0, i — 1,2 

(b) H 0 : fy = 0, Hp not all /3, = 0, j = 1,2,...,^ 

(c) H 0 \ (oLp)ij = 0, Hp not all (afS) u = 0, i = 1,2= 1,2,...,/? 

Before collecting the data, we may decide to test only one of the three hy¬ 
potheses. In this case we select the one to be tested, along with the desired 
significance level a. 

If, on the other hand, we wish to test all three hypotheses, a problem arises 
because the three tests are not independent in the probability sense. Suppose that 
a is the significance level associated with the test as a whole, and a ', a ", and a 
are the significance levels associated with hypotheses a, b, and c, respectively. 
Kimball (1951) has shown that 

a < 1 - (1 - a')( 1 ~ <*")(1 - a"') 

If a' = oT = a"' = 0.05, then a < 1 — (0.95) 3 , or a < 0.143. This result 
indicates that the probability of rejecting one or more of the three hypotheses is 
something less than 0.143 when a significance level of 0.05 has been chosen for 
each hypothesis and all are true. To demonstrate the hypothesis-testing procedure 
for each case, we shall perform all three tests. However, you should keep in mind 
the problem involved in interpreting the results. Dixon and Massey (1969) and 
Guenther (1964) discuss this problem. 

4. Calculations. By an adaptation of the procedure used in the completely ran¬ 
domized design, we can partition the total sum of squares under the present model 
into two parts, as follows: 

a b n a b n 

sis ( x ijk - x..) 2 = s s s (xu. - x > 2 

/-I j—\ k= 1 11 

+ S S S ( x ijk - xi ,) 2 

?== 1 j= 1 k — 1 


( 8 . 6 . 2 ) 


or 


SST = SSTr -f SSE (8.6.3) 

We can partition the sum of squares for treatments into three parts, as follows: 

a b n a b n 

2 2 2 (% - * ,.) 2 = 2 2 2 (*«.. - *...) 2 

i= \ j= \ k=\ ' ' ' ' 


/=1 7 =1A =1 

« & n 


+ 22 2 c*j. - 


i = l j= 1 A = 1 
a b n 


+ 22 2 (% - - a:./. + 


( 8 . 6 . 4 ) 


/= 1 ./= 1 A = 1 


or 


SSTr = SSA + SSB + SSAB 


( 8 . 6 . 5 ) 


The computing formulas for the various components are as follows: 


and 


In these equations, 


C 

C 

SSE = SST - SSTr 

Y<7 T 2 

SSA = / = -- 1 f - - C 

bn 

'S'b t 2 

= 1 ■* / 

SSB = —-- - C 

an 


sst =222 4* 

i=l y=l A= 1 


SSTr = 


7^2 

^i=l y/' = 1 1 ij. 


SSAB = SSTr - SSA - SSB 


C = 


a b n 

2 2 2 


17=1 A=1 



abn 


( 8 . 6 . 6 ) 

( 8 . 6 . 7 ) 

( 8 . 6 . 8 ) 
( 8 . 6 . 9 ) 

( 8 . 6 . 10 ) 

( 8 . 6 . 11 ) 


( 8 . 6 . 12 ) 


5. ANOVA table. In general, we can display the results of the calculations for a 
two-factor completely randomized fixed-effects model experiment as shown in 
Table 8.6.4. The degrees-of-freedom column in Table 8.6.4 shows that to carry 
out the analysis of variance for this type of factorial experiment, there must be 



TABLE 8.6.4 
Analysis-of- 
variance table for 
a two-factor 
completely 
randomized 
experiment (fixed- 
effects model) 


Source 

SS 

df 


MS 

F 

A 

SSA 

a - 1 

MSA - 

SSA/(a - 1) 

MSA/MSE 

B 

SSB 

b - 1 

MSB = 

SSB l(b - 1) 

MSB/ MSE 

AB 

SSAB 

(a - 1 )(b- 1) 

MSAB = 

SSAB/(a — 1)(6 —T) 

MSAB/ MSE 

Treatments 

SSTf 

ab - 1 




Error 

SSE 

ab(n - 1) 

MSE = 

SSE /{ab)(n - 1) 


Total 

SST 

abn — 1 





more than one observation per cell. If there is only one, the degrees of freedom 
are all expended on the two factors and interaction. As a result, there are no 
degrees of freedom with which to compute the error mean square, a necessary 
factor in the computation of the variance ratio. 

6. Decision. If the assumptions stated earlier hold, and if each null hypothesis is 
true, each of the variance ratios in Table 8.6.4 is distributed as F with the indicated 
degrees of freedom. Any variance ratio equal to or greater than the critical value 
of F causes rejection of the associated null hypothesis. When we reject H 0 : cq = 
a 2 - • • • = a a , we conclude that there are differences among the levels of A. 
If we reject H 0 : Pi = |3 2 = * * * = P/,, we conclude that there are differences 
among the levels of B. When we reject H 0 : = 0, we conclude that factors 

A and B interact. 

When we reject the hypothesis of no interaction, we are usually interested in 
the effects of interaction rather than the levels of factors A and B. In other words, 
we are interested in learning what combinations of levels are significantly differ¬ 
ent. We shall not deal with this problem in this text. [For additional information, 
see Dixon and Massey (1969), Guenther (1964), Steel and Torrie (1980), and 
Scheffe (1959).] 

Remember that if we make all three tests simultaneously, the probability of 
rejecting at least one null hypothesis when all are true is greater than the probability 
associated with each individual test. 

The following example illustrates the analysis of variance for the factorial ex¬ 
periment. 

EXAMPLE 8.6.2 Table 8.6.5 shows the results of the experiment described at the 
beginning of this section. 

We can carry out the analysis of variance of the data shown in the table as 
follows. 

1. Model. Equation 8.6.1 specifies the appropriate model. 

2. Assumptions. We presume that the assumptions for the factorial experiment 
described earlier are met. 

3. Hypotheses. To demonstrate the hypothesis-testing procedure for each of the 
possible hypotheses, we perform all three tests. Recall, however, the problem 
involved in interpreting the results. For this example, we can test the following 
hypotheses. 



TABLE 8.6.5 
Sample data. 
Example 8.6.2 



Factor B (package type) 




Factor A (store type) 


Levels 





levels 

1 

2 


3 

Totals 

Means 


5 

6 


4 



1 

6 

8 


3 

48 

5.33 


4 

7 


5 




7 

5 


3 



II 

8 

5 


6 

52 

5.78 


8 

6 


4 




3 

6 


8 



III 

2 

6 


9 

49 

5.44 


4 

5 


6 



Total 

47 

54 


48 

149 


Means 

5.22 

6.00 


5.33 


5.52 

Cell 

9-1 6 2 

a 2 ^i 

9 2 J&2 

9 2 ^3 

a 3^1 a 3^2 

2363 

Total 15 

21 12 

23 

16 

13 

9 17 

23 

Mean 5.00 

7.00 4.00 

7.67 

5.33 

4.33 

3.00 5.67 

7.67 


(a) H 0 : a x — a u — a m — 0 
H x \ not all oii = 0 

(b) H 0 : Pi = p 2 = ft = 0 

Hi. not all Pj = 0 

(c) H 0 : (oiP)ij = 0 

Hj: not all (aP)ij = 0 


(the row [store] effects are all 0) 
(they are not all 0) 

(the column [pack¬ 
age] effects are all 0) 

(they are not all 0) 

(the interaction effects are all 0) 
(they are not all 0) 


Suppose that we select a significance level of a = 0.05. 
4. Calculations 


SST 

SSTr 

SSE 

SSA 

SSB 

SSAB 


5 2 4- 6 2 + • • • + 6' 


149 2 

27 


= 907 - 822.26 = 84.74 


15 2 + 21 2 + • • • + 23 2 


- 822.26 = 65.41 


84.74 - 65.41 = 19.33 
48 2 + 52 2 + 49 2 


- 822.26 = 0.96 


47 2 + 54 2 + 48 2 


- 822.26 = 3.18 


65.41 - 0.96 - 3.18 = 61.27 



TABLE 8.6.6 
ANOVA table. 

Source 

SS 

df 

MS 

. .... £ . 

F 

Example 8.6.2 

>4 

0.96 

2 

0.48 

0.45 


B 

3.18 

2 

1.59 

1.49 


AB 

61.27 

4 

15.32 

14.32 


Treatments 

65.41 

8 




Residual 

19.33 

18 

1.07 



Total 

84.74 

26 




5. ANOVA table . Table 8.6.6 shows the analysis of variance for this example. 

6. Decision. The three critical values of F for the three hypotheses are 3.55, 3.55, 
and 2.93, respectively. Thus we would reject the last hypothesis and fail to reject 
the first two. If we had decided before collecting the data to test the null hypothesis 
of no interaction effects, we would conclude that there is interaction between the 
type of package used and the type of store where the product is sold. That is, 
some combinations of store types and package types attract different revenue than 
other combinations. For this test p A > 0.10, p B > 0.10, and p AB < 0.005. 

We have discussed only the case in which the number of observations in each 
cell is the same. When the number of observations in each cell is not the same, 
the analysis is more complicated. For further information on unequal cell fre¬ 
quencies, see the previously cited references. 


Exercises 



Carry out the six-step analysis-of-variance procedure at the indicated level of significance 
and compute the p value for the test. 

8.6.1 The temperature and pressure used in molding a certain plastic affect its tensile 
strength. The following table shows the tensile strength (coded) in pounds per square inch 
of specimens of plastic molded at three different temperatures and under three different 
pressures. Complete the analysis of variance for these data, using a 0.05 significance level. 


Temperature 


Pressure ) II III 



8 

9 

10 


9 

9 

10 

A 

9 

10 

11 


9 

9 

11 


9 

10 

11 


10 

10 

8 


11 

10 

8 

B 

12 

9 

8 


12 

8 

9 


12 

9 

9 


8 12 9 

9 12 8 
9 11 9 
8 11 9 
8 11 8 


C 


8.6.2 The following table shows the weights of salmon caught by species and area. Com¬ 
plete the analysis of variance of these data, using a 0.05 significance level. 


Species 


Area 

King 

Red 

Coho 

Pink 

Chum 


12 

5 

7 

3 

8 


13 

5 

8 

4 

10 

1 

13 

6 

6 

4 

9 


12 

5 

7 

5 

9 


13 

6 

7 

4 

9 


20 

5 

6 

3 

8 


24 

6 

7 

4 

9 

2 

22 

6 

7 

4 

8 


23 

6 

7 

4 

8 


22 

6 

7 

3 

8 


12 

5 

7 

3 

7 


11 

5 

7 

4 

8 

3 

12 

5 

8 

4 

7 


12 

6 

7 

4 

7 


12 

5 

7 

4 

8 



How might you carry out random sampling in this experimental design? Are you surprised 
that there is an interaction between the two factors, area and species? What is your inter¬ 
pretation of this phenomenon? What fact(s) could make fishing for King salmon in Area 
2 undesirable? What approximate overall level of significance does your statistical test 
have when you test both main effects and the effect of interaction between species and 
area? How many treatments would be required if this experiment were performed as a 
one-way ANOVA? Could a one-way ANOVA detect the effects of interaction? 

8.6.3 A manufacturer introduces a new kind of camera into 4 geographic regions. Within 
each region, 3 subareas with different levels of competitive activity are identified. The 
following table shows the weekly sales (coded) of the product in each subarea for a period 
of 4 weeks. Complete the analysis of variance of these data, using a 0.05 level of signif¬ 
icance. 


Level of competitive activity 


Geographic region 


4 


4 8 10 3 

3 7 14 2 

2 6 13 1 

3 6 15 2 



12 

10 

9 

11 


6 

7 

8 
7 


3 

2 

3 

4 


8 

7 

7 

6 



Summary 


This chapter covered the basic concepts and techniques of analysis of variance 
that we use to test for significant differences among several means. We discussed 
this topic in the framework of four experimental designs: the completely random¬ 
ized, the randomized block, the Latin square, and the factorial experiment. In 
addition, the chapter discussed the problem of multiple tests of significance be¬ 
tween pairs of means. Tukey’s test was demonstrated by an example. 

You learned that you can use a technique known as blocking to eliminate 
extraneous sources of variability in an experiment. The objective in blocking is 
to reduce the size of the error term in order to improve the chance of detecting 
any differences that may exist between population means. We use the randomized 
block design to eliminate one source of extraneous variation. We use the Latin 
square design when we wish to eliminate two sources of extraneous variation. 

You also learned that we sometimes want to design an experiment in such a 
way that we can measure the effects of interaction. An experiment of this type is 
called a factorial experiment. 

In the interest of brevity, this chapter omitted many related topics. The follow¬ 
ing brief comments will introduce you to some additional important topics in 
analysis of variance. Further information is available in the texts on experimental 
design and analysis of variance listed in the references. 

Missing Data In the course of an experiment, accidents that result in the loss of 
some of the observations are not uncommon. When this happens, we need some 
method of dealing with the problem. Ostle and Mensing (1975) and Steel and 
Torrie (1980) give numerical examples of two widely used methods for handling 
the problem of missing data. An extensive list of references on missing data is 
found in Federer (1955). References not cited by Federer include the articles by 
Glenn and Kramer (1958), Kramer and Glass (1960), and Baird and Kramer 
(1960). 

Efficiency We often want to know how much improvement we may expect in 
an experiment as a whole if we use one type of design instead of some other type. 
To answer this question, we must determine the relative efficiency of the two 
designs. Cochran and Cox (1968), Steel and Torrie (1980), Scheffe (1959), Ostle 
and Mensing (1975), and Sokal and Rohlf (1969) discuss this topic further. 

Transformations If the data do not meet the assumptions underlying the analysis 
of variance, an alternative procedure is to perform a transformation on the data, 
so that the assumptions are more nearly met. Most of the texts on general statistics 
and experimental design previously cited discuss transformations to some extent. 
The subject is also treated in the texts by Quenouille (1950), Sokal and Rohlf 
(1969), and Pearce (1965), and in an article by Bartlett (1947). Federer (1955) 
gives additional references. 

Nonparametric Alternatives Alternatively, when the assumptions underlying 
analysis of variance are not met, we may use nonparametric methods of analysis. 




Review Questions 


5 



We shall discuss these methods more fully in Chapter 12. We shall present ap¬ 
propriate nonparametric alternatives to some of the procedures discussed in this 
chapter. In addition to the references cited in this chapter, see the extensive 
bibliography on experimental design by Herzberg and Cox (1969). 


Where appropriate, do an analysis of variance at the indicated level of significance and 
compute a p value for the test. If no level of significance is indicated, compute the p value 
and state whether or not you think the null hypothesis should be rejected. 

1. Define analysis of variance. 

2. For each of the following experimental designs, describe a situation in your particular 
field of interest in which that design would be appropriate. Use real or realistic data and 
do the appropriate analysis of variance for each, (a) Completely randomized design, (b) 
randomized complete block design, (c) Latin square design, (d) completely randomized 
design with a factorial experiment. 

3. What are the fundamental assumptions underlying the analysis of variance? 

4. Explain the difference between the fixed-effects model and the random-effects model. 

5. What is the correction factor? How is it computed? How is it used? 

6. What are two other names for the error sum of squares? 

7. When is the randomized complete block design appropriate? 

8. How is randomization achieved in the randomized complete block design? 

9. What is the primary objective in using the randomized complete block design? 

10. Explain the various components of the randomized complete block model. 

11. What is interaction? 

12. What is meant by level? 

13. Explain the components of the factorial model. 

14. In the factorial experiment, why is there always more than one observation on the 
variable of interest for each experimental condition? 

15. Explain the essential features of a Latin square design. 

16. What are the assumptions underlying the Latin square design? 

17. How is randomization achieved when the Latin square design is used? 

18. The following table shows the unit cost of producing a certain product by size of firm. 
Can we conclude at the 0.05 level of significance that there are differences among the 
three sizes of firm with respect to mean unit production cost? 


Small 8776666888 

Medium 3444245343 

Large 4555345456 


19. A psychologist is hired to compare the levels of job satisfaction of salespersons with 
three large companies. Ten salespersons are selected at random from each firm and given 
in-depth tests to ascertain their level of job satisfaction. The results are given in the 
following table. Do these data provide sufficient evidence to indicate a difference in mean 
job satisfaction among the three firms? Let a = 0.01. 


Firm A 

67 

65 

59 

59 

58 

61 

66 

53 

51 

64 

Firm B 

66 

68 

55 

59 

61 

66 

62 

65 

64 

74 

Firm C 

87 

80 

67 

89 

80 

84 

78 

65 

72 

85 


JL 


20. The following table shows the results of an experiment designed to compare the tensile 
strength of a certain product produced from raw material from 5 different vendors. The 
investigator wishes to eliminate the effects of (a) the size stock from which the product is 
manufactured, and (b) the assembly line on which the processing took place. The data in 
the table have been coded for easy computation. Test at the 0.05 level of significance the 
null hypothesis of no difference in tensile strength among the different vendors. 


Stock size 


Assembly line 

1 

2 

3 

4 

5 

1 

A (23) 

*(21) 

C( 29) 

5(25) 

£>(20) 

II 

D{ 22) 

C(30) 

*(22) 

4(24) 

5(23) 

III 

* 09 ) 

>4(22) 

5(24) 

£7(22) 

C( 27) 

IV 

C(25) 

5(20) 

D{ 21) 

£(20) 

4(23) 

V 

5(21) 

£7(23) 

A {72) 

C(31) 

£(27) 


21. The following table shows the results, in miles per gallon, of an experiment designed 
to compare 4 brands of gasoline. The experiment uses 4 makes of cars. Cars serve as a 
blocking factor. After eliminating car effects, do these data provide sufficient evidence to 
indicate that there are differences among the brands of gasoline? Let a — 0.05. 


Gasoline 

Make of automobile A B C D 


1 

17 

18 

15 

20 

II 

21 

20 

16 

21 

III 

25 

24 

20 

19 

IV 

15 

22 

15 

18 


22. An advertising agency conducts an experiment to try to assess the effectiveness of 
various formats of TV commercials. Fifty regular television viewers are randomly assigned 
to view one of five formats of a TV commercial for a cold remedy. On the basis of an 
interview following the viewing, a score measuring the impact of the commercial on the 
participant is recorded. The results are as follows. Can we conclude on the basis of these 
data that the formats differ in their effectiveness? Let a = 0.05. 


TV commercial format 



A 



B 



C 



D 



E 


20 

23 

21 

28 

27 

22 

33 

34 

25 

33 

29 

31 

49 

41 

41 

23 

26 

24 

28 

23 

29 

26 

27 

33 

29 

27 

25 

39 

41 

48 

26 

23 

20 

27 

25 

28 

25 

32 

25 

26 

26 

33 

43 

43 

46 

24 



21 



34 



32 



35 




23. Use Tukey’s multiple-comparison test to determine which pairs of means in Exercise 
22 differ significantly. Let a = 0.05. 

24. A study is conducted to compare the job-satisfaction levels of assembly-line employees 
whose working environments are structured to different degrees. Also of interest is the 





relationship of length of employment to job satisfaction. The researchers wish to study the 
interaction between length of employment and the extent to which the working environment 
is structured and the effect of this interaction on job satisfaction. 

The following tabic shows job-satisfaction scores from the study. The data are coded 
for ease of calculation. Do an appropriate analysis of variance of these data. 

Nature of working environment 


Length of employment 

Highly 

Moderately 


(years) 

structured 

structured 

Unstructured 


12 

10 

8 


15 

10 

7 

<5 

15 

9 

7 


14 

10 

8 


12 

9 

6 


12 

10 

10 


14 

10 

11 

5-10 

12 

14 

12 


10 

14 

10 


11 

10 

14 


9 

10 

12 


10 

11 

14 

11 or more 

9 

10 

15 


9 

10 

15 


10 

12 

18 


25. Refer to Exercise 19 and use Tukey’s test to make all pairwise comparisons. Let a 
= 0 . 01 . 

26. In a random sample of 30 junior executives, each is classified by potential for pro¬ 
motion, good, poor, or uncertain. Each subject is then given a test to measure his or her 
level of anxiety. The results (coded for ease of computation) are as follows. What are your 
conclusions about this study? Let a - 0.05. 

Potential 

Good 343225325444 

Poor 4 8 7 10 10 4 8 8 

Uncertain 4534334465 


27. Industrial psychologists conduct an experiment to evaluate three methods of motivating 
employees in a factory. The researchers feel that they should use a design that will allow 
for blocking by education. Scores (coded) on a test designed to measure effectiveness of 
the methods are as follows. Do these data indicate that there is a difference in treatment 
effects? Let a = 0.05. 

Treatment group 


Education group ABC 


1 8 9 7 

2 5 8 7 

3 3 9 5 

4 4 8 5 


28. A textile manufacturer experiments to compare the effects of 4 different processing 
methods on the strength of a synthetic fiber. The investigator uses the randomized complete 
block design, with the 4 sources of raw material serving as blocks. The following table 
gives the strengths (coded) of 16 specimens prepared during the experiment. After elimi¬ 
nating block effects, can the manufacturer conclude that the different processing methods 
do have different effects on fiber strength? Let a = 0.01. 


Processing method 


Block 

A 

B 

C 

D 

1 

11 

10 

13 

14 

2 

12 

10 

16 

14 

3 

16 

17 

18 

18 

4 

17 

15 

18 

18 


29. A different brand of fioor tile is installed in each of 5 areas of a public building. Five 
different cleaning agents are used in such a way that a Latin square design results. The 
following table shows measures of deterioration and wear for each brand of tile at the end 
of 1 year. Do these data provide sufficient evidence to indicate a difference in durability 
among the brands? Let a = 0.05. 


Cleaning agents 


Area 

1 

2 

3 

4 

5 

1 

A(58) 

B(62) 

C(77) 

D(94) 

E (66) 

II 

B(48) 

A(76) 

D(103) 

E(58) 

C(99) 

III 

C(74) 

D (113) 

E (70) 

A(105) 

B(93) 

IV 

E(66) 

C(95) 

A(111) 

B (108) 

D(150) 

V 

D(110) 

E(63) 

B(113) 

C(126) 

A(1 60) 


30. A study is designed to evaluate 4 advertising strategies (A, B, C, and D). Investigators 
use a Latin square design in which the columns represent the 4 seasons of the year and 
the rows represent 4 areas of the country. The variable of interest is consumer knowledge 
of the product advertised. The results (coded) are as follows. Test, at the 0.05 level of 
significance, the null hypothesis of no difference among treatment means. 


Season 


Area 

Fall 

Winter 

Spring 

Summer 

1 

A(12) 

B (10) 

C(10) 

D {12) 

2 

B (10) 

A(12) 

D(12) 

C(10) 

3 

C(10) 

D (11) 

B (10) 

A(12) 

4 

D (11) 

C(10) 

A(13) 

B (10) 


31. The following table shows the neuroticism scores of 27 employees classified on the 
basis of the stressfulness of their jobs and length of employment. Perform an appropriate 
analysis of variance of these data. 



/ / / / / 


Stressfulness of job (Factor B) 


Length of employment 
(in years) (Factor A) 

Very 

stressful 

Moderately 

stressful 

Not 

stressful 


25 

18 

17 

<5 

28 

23 

24 


22 

19 

19 


28 

16 

18 

5-10 

32 

24 

22 


30 

20 

20 


25 

14 

10 

>10 

35 

16 

8 


30 

15 

12 



32. A research team hired by a fertilizer manufacturer conducts an experiment to study 
the yield of a certain grain using 3 different levels of fertilizer—low, medium, and high. 
Three varieties of seed are used in the experiment, making a total of 9 treatment combi¬ 
nations. Each treatment combination is assigned at random to one of 27 plots of ground, 
so that 3 plots receive each treatment. The yields (coded) are as follows. Perform an 
appropriate analysis of variance of these data. 


Fertilizer level 


Variety 

Low 

Medium 

High 


5 

8 

10 

A 

8 

8 

12 


7 

10 

10 


6 

10 

15 

B 

8 

12 

14 


6 

11 

14 


7 

12 

16 

C 

8 

12 

16 


10 

14 

18 



33. Market researchers designed a study to compare the characteristics of people with 
high exposure to various media. Through intensive interviewing, they classified 40 ran¬ 
domly selected adults on the basis of the extent of their exposure to radio, TV, newspapers, 
and magazines. They included only subjects with “high” exposures to each medium. For 
each subject, they obtained scores on a wide range of demographic and psychographic 
variables. The following table shows the subjects’ scores on a test designed to measure 
knowledge of current events. Test for a difference among population means at the 0.05 
level. Prepare a report on the results of this study for the president of the market research 
firm. 


High exposure to 


Radio TV Newspapers Magazines 


11 

13 

16 

18 

21 

22 

11 

14 

20 

15 

15 

18 

14 

13 

17 

16 

15 

21 

16 

19 

13 

10 

14 

14 

17 

11 

15 

22 

25 

19 

15 

16 

15 

15 

18 

18 

11 



16 



14 



13 




34. Researchers wish to study the effect of crowding on the productivity of office workers. 
Office workers of the same age, sex, and level of training and experience are randomly 
assigned to one of three groups representing three levels of crowding: Severe, moderate, 
or none. The following table shows the results. Can we conclude from these data that 
crowding affects productivity? Let a = 0.05. What use can we make of the results of this 
experiment? 


Severe 

22 

49 

32 

37 

32 

22 

Moderate 

31 

30 

43 

30 

46 


None 

68 

73 

78 

47 

56 

59 


35. A publishing firm hires a reading specialist to assess the effects of different typefaces— 
A, B, and C—on reading comprehension scores of school children. The following table 
shows, for 20 subjects randomly assigned to a treatment, the difference in the reading 
scores of children given material set in a standard typeface and those given material set 
in experimental typefaces. The material was of equal difficulty. Can we conclude from 
these data that typeface has an effect on reading comprehension? Let a = 0.05. What use 
can we make of the results of this experiment? 


A 

12 

13 

7 

15 

7 




B 

15 

11 

15 

15 

14 

17 

18 


C 

16 

18 

24 

18 

24 

23 

24 

22 


36. Researchers conducted an experiment designed to evaluate the effectiveness of four 
different methods—A, B, C, and D—of teaching problem solving. The following table 
shows, by teaching method, the scores made by the participating subjects (who were 
randomly assigned to one of the treatments) when they were forced to solve problems 
following this training. Do these data provide sufficient evidence to indicate that the four 
teaching methods differ in effectiveness? Let a = 0.05. Do the results of this experiment 
have any relevance to the world of business? Explain. 


A 

48 

38 

20 

16 

95 



B 

91 

37 

53 

91 

80 

38 


C 

67 

61 

33 

85 

99 

95 

81 

D 

57 

62 

50 

43 

59 

60 

70 


37. Four different groups of kittens were given food with four different flavors—A, B, 
C, and D. Kittens of the same age, sex, and species were randomly assigned to one of 
the four flavors. The following table shows the amount (coded) of food consumed by each 
kitten in each group during a 24-hour period. Test to determine whether the kittens’ 
acceptance of the four flavors differs. Let a = 0.05. How might management use the 
results of this experiment in deciding which flavor or flavors to market? Could the results 
of this experiment be used in advertising? Explain. 


A 

12 

14 

18 

11 

19 

10 

B 

23 

20 

17 

23 

20 


C 

29 

27 

30 

35 

33 


D 

38 

33 

40 

34 

34 

37 


38. Researchers compared three brands of automobile tires of the same size. The criterion 
was length of life in thousands of miles. Six tires of each brand were tested. The following 
table shows the results. Can we conclude from these data that the three brands are not 



equal in quality? Let a ■— 0.05. Prepare a report on the results of this experiment for the 
management of the firm that manufactures Brand B tires. Prepare advertising copy for 
Brand B tires using the results of this experiment. 


Brand A 

43 

44 

42 

50 

46 

48 

Brand B 

51 

49 

52 

45 

48 

50 

Brand C 

47 

41 

42 

45 

43 

42 


39. Researchers conducted a study to compare the characteristics of assembly-line em¬ 
ployees of a watch factory. Employees were categorized into five achievement groups: 
Very high, moderately high, average, low, and very low. Researchers selected a random 
sample from each group and interviewed and tested them in depth. The following table 
shows the self-concept scores of subjects in the five groups. Do these data provide sufficient 
evidence to indicate that the populations differ with respect to mean level of self-concept? 
Let a — 0.05. What use can management make of the results of this experiment? 


Very high 

90 

90 

90 

95 

90 

88 

Mod. high 

79 

83 

85 

75 

93 


Average 

77 

66 

85 

87 

67 

73 

Low 

97 

81 

92 

85 



Very low 

66 

63 

83 

65 

74 

64 



* 

pi ' 
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Effect of Program Content on Viewers' Responses to TV Commercials 


Advertising agencies are concerned with the extent of viewers' involvement in 
the content of various programs, and how much this involvement has to do 
with the effectiveness of TV commercials. Two investigators, Soldow and Prin¬ 
cipe,* proposed four hypotheses about viewers' responses to commercials. One 
hypothesis was that viewers will recall more brand names when they see com¬ 
mercials embedded in "less involving" programs than in "more involving" ones. 

The following three groups, composed of 29 subjects each, participated in 
the experiment: 

1. Those exposed to commercials in more involving programs 

2. Those exposed to commercials in less involving programs 

3. Those who watched commercials only (the control group) 

The mean brand-recall scores for the three groups were as follows. 


Group 1 2 3 

Mean 1.21 2.24 2.28 


r Gary F. Soldow and Victor Principe, "Response to Commercials as a Function of Program Content," Journal 
of Advertising Research, 21 (April 1981), 59-65. 




An analysis of variance of the data yielded the following sums of squares: 
among groups, 21.40; within groups, 85.86. 

What can one conclude from these results? Let a = 0.05. What is the p value 
for the test? Compare all possible pairs of means. What assumptions are re¬ 
quired? 


Job Training and Worker Satisfaction 


The satisfaction of workers is a subject of perennial concern to managers. Most 
managers assume that if they provide workers with more training and edu¬ 
cation, the workers will be more satisfied with their jobs. Drexler and Lindell* 
conducted a study on (1) whether an objective, specific measure of training/job 
fit (the worker does the job for which he is trained) is directly related to the 
worker's satisfaction with that job, and (2) whether the social aspects of work 
environments are more strongly related to satisfaction when people work in 
jobs for which they are not trained. 

Subjects of the study were 2286 Army personnel. Drexler and Lindell divided 
subjects into two treatment groups: (a) those who were now working in their 
primary military occupational specialty (MOS) and (b) those who were not. 
Drexler and Lindell considered membership in group (a) as an indication of 
training/job fit. They obtained a happiness score by using a seven-item measure 
that touched on the subjects' satisfaction with their pay, their supervisor, their 
co-workers, the organization, their opportunities for advancement, and the 
job itself. 

One of the analyses Drexler and Lindell performed was one-way analysis of 
variance, using 2232 subjects. They tested the null hypothesis of no difference 
between subjects in group (a) and group (b) with respect to mean level of job 
satisfaction. They obtained a between-groups mean square of 13.29 and a 
within-groups mean square of 0.88. 

What are the between-groups and within-groups degrees of freedom for 
this test? What is the computed value of F? Should the null hypothesis be 
rejected at the 0.05 level? Why? What is the p value for the test? What con¬ 
clusions can one draw from these results? Construct an ANOVA table for these 
results. What assumptions are necessary? 


*John A. Drexler, Jr., and Michael K. Lindell, "Training/Job Fit and Worker Satisfaction/' Human Relations, 
34 (1981), 907-915. 



9. Simple Linear Regression 
and Correlation 


Chapter Objectives: This chapter introduces you to 

two of the most widely used of all statistical tech¬ 
niques—regression analysis and correlation analysis. 

After studying this chapter and working the exercises, 

you should be able to do the following. 

1. State and discuss applications of the simple linear 
regression and correlation models 

2. State the assumptions underlying the two methods of 
analysis 

3. Obtain an equation that you can use for prediction 
and estimation 

4. Perform hypothesis tests to determine whether you 
should conclude that two variables are linearly related 

5. Compute a measure of the strength of the correlation 
between two variables 

6. Perform a hypothesis test to determine whether you 
should conclude that two variables are correlated 

7. Construct a confidence interval for a population 
measure of correlation 


9.1 INTRODUCTION 


In analyzing data generated by a business or industrial operation, we often want 
to know something about the relationship between two variables, X and Y. Is there 
a relationship between the sales of a certain product and the age of persons in the 
various market areas? Do employees who score high on a certain aptitude test 
perform well on the job? What is the nature of the relationship between the amount 
of a certain chemical in some material and its optical density? Between the price 
of a product and demand for that product? Between the hardness and the tensile 
strength of a certain metal? The list of pairs of variables with a relationship of 
potential interest is almost limitless. 

One approach to studying such relationships is the analysis of variance, dis¬ 
cussed in Chapter 8. This chapter will show that we can also examine the nature 
of the relationships between variables such as those listed using regression analysis 
and correlation analysis. Although regression and correlation are related, they 
serve different purposes. 

Regression analysis helps one determine the probable form of the relationship 
between variables. The objective of this method of analysis is usually to predict 
or estimate the value of one variable corresponding to a given value of another 
variable. The English scientist Sir Francis Gallon (1822-1911) first proposed the 
ideas of regression in reports of his research in the area of heredity—first in sweet 
peas and later in human stature. [See Galton (1899, 1908) and Pearson (1930).] 
Galton used first the word reversion and later the word regression to describe a 
tendency of adult offspring, even those with short or tall parents, to revert back 
toward the average height of the general population. 

Correlation analysis is concerned with measuring the strength of the relation¬ 
ship between variables. When we compute measures of correlation from a set of 
bivariate data, our interest focuses on the degree of correlation between the var¬ 
iables. The concepts and terminology of correlation analysis also originated with 
Galton (1888), who first used the word correlation in 1888. 

In this chapter we shall limit our discussion of regression and correlation to 
studying the form and strength of the relationship between two variables. The 
order of presentation is as follows: 

f 1. The regression model 

\ 2. The assumptions underlying simple linear regression 
s 3. Obtaining the regression equation 
1 4. Evaluating the regression equation 

V ,j5 % Using the regression equation 

6. The correlation model 

7. A measure of the strength of a relationship 

8. Considerations in deciding between regression and correlation 

9. Some precautions 

In Chapter 10 we consider relationships among three or more variables. 



9.2 THE SIMPLE LINEAR REGRESSION MODEL 


The typical regression problem is like most problems in applied statistical infer¬ 
ence. We have available for analysis a sample of observations from some real or 
hypothetical population. On the basis of our analysis of these data, we want to 
reach decisions about the population from which we presume the sample was 
drawn. In order to handle the analysis intelligently, and interpret the results prop¬ 
erly, we must understand the nature of the population from which the sample was 
drawn. We should know enough about the population to be able either to construct 
a mathematical model to represent it or to determine whether it fits some estab¬ 
lished model reasonably well. 

Suppose, for example, that we want to study the relationship between workers’ 
aptitude for a certain job and their satisfaction in that job. After we observe that 
employees who have a greater aptitude for the job also seem to be better satisfied 
with the job, we might suspect that the relationship between the two variables is 
linear. If we can learn enough about this suspected relationship, we may be able 
to predict a prospective employee’s level of job satisfaction on the basis of a 
knowledge of his or her level of aptitude for the job. In this case, the unit of 
association is the employee. The variable aptitude may be designated by X and 
the variable job satisfaction by Y. To obtain data on which to base our study of 
the relationship between the two variables, we would select a random sample of 
employees. We would give each of them two tests—one to measure aptitude for 
the job and one to measure level of job satisfaction. 

Most statistical models that are of practical value do not conform perfectly to 
the real world. A model that fits the situation at hand perfectly is usually too 
complicated for practical use. On the other hand, an analysis that has forced the 
sample data into a model that is not applicable is worthless. Fortunately we can 
get useful results from a model that falls somewhere between these two extremes. 

The type of relationship between the two variables X and Y that is of concern 
here is a linear relationship. This implies that the relationship of interest has 
something to do with a straight line. The measurements that are available for 
analysis come in pairs, (x 2 ,y 2 ), • . (x„,y„), where the measurements 

<X,y f ) are taken on the same entity, called the unit of association. 

Two variables X and Y are linearly related if their relationship can be expressed 
by the following simple linear model: 

J/ = OL + fiXt + e t (9.2.1) 

where y ( is the value of the Y variable for a typical unit of association from the 
population, x f is the value of the X variable for that same unit of association, a 
and fi are parameters called the regression constant and the regression coefficient, 
respectively, and e, is a random variable with a mean of 0 and a variance of or 2 . 

Note the similarity between the model of Equation 9.2.1 and the one-way 
analysis-of-variance model of Equation 8.2.6. The reason is that regression anal¬ 
ysis and analysis of variance are essentially the same. In fact, we can get the same 
results obtained in Chapter 8 through analysis of variance by using appropriate 



regression models, in which treatments, blocks, factors, and so on are identified 
as variables, either qualitative or quantitative. A further discussion of this point 
would be too complex for this text. [For more information, see the books by 
Kempthome (1952) and Mendenhall (1968).] 

To understand the model of Equation 9.2.1, we must consider the assumptions 
underlying simple linear regression. 


9.3 THE ASSUMPTIONS UNDERLYING SIMPLE LINEAR REGRESSION 

As we have said, simple linear regression analysis is concerned with the relation¬ 
ship between two variables, X and Y. For reasons that will become apparent, the 
variable X is called the independent variable , and Y is called the dependent var¬ 
iable. In discussing the linear relationship between X and F, given in Equation 
9.2.1, we speak of the regression ofY on X. 

The following assumptions underlie the simple linear regression model of Equa¬ 
tion 9.2.1: 

1. Values of the independent variable X may be either “fixed” or random. That 
is, we may select the values of X in advance (“fixed”), so that as we collect the 
data, we control the values of X. Or we may obtain the values of X without 
imposing any restrictions, in which case X is a random variable. When the X’s 
are nonrandom, we refer to the regression model as the classic regression model , 
which is model I of Chapter 8. When X is a random variable, we have model II 
of Chapter 8. As Section 9.7 will show, this is the model required for correlation 
analysis. 

2. The variable X is measured without error. From a practical point of view, this 
means that the magnitude of the measurement error in X is negligible. 

3. For each value of X there is a subpopulation of Y values. For most of the 
inferential procedures of estimation and hypothesis testing to be valid, these sub¬ 
populations must be normally distributed. To demonstrate inferential procedures, 
we shall assume in the examples and exercises that follow that the Y values are 
normally distributed. 

4. The variances of the subpopulations of Y are ail equal. 

5. The means of the subpopulations of Y all lie on the same straight line. This 
assumption is known as the assumption of linearity. It may be expressed sym¬ 
bolically as 

M v |.v = Oi + fiXj (9.3.1) 

where /x v | v is the mean of the subpopulation of Y values assumed to exist for x jt 
a particular value of X. When viewed geometrically, as in Figure 9.3.1, a and /3 
represent the Y intercept and slope, respectively, of the line on which all the 
subpopulation means are assumed to lie. 



FIGURE 9.3.1 
Representation of 
the simple linear 
regression model 



6. The Y values are statistically independent. This means that in drawing the 
sample, the values of Y chosen at one value of X in no way depend on the values 
of Y chosen at another value of X. 

We are now in a position to shed some more light on the term e { in the simple 
linear model. Solving Equation 9.2.1 for e h we have 

e t = yi ~ (a + fai) (9.3.2) 

Thus shows the amount by which y / deviates from the mean of the subpopulation 
of Y values from which it is drawn, since, by Equation 9.3.1, fi y \ x = a + /&,. 
The subpopulations of Y values are assumed to be normally distributed with equal 
variances. Thus the e/s for each subpopulation are also normally distributed, with 
a variance equal to <x 2 , the common variance of the subpopulations of Y values. 
The e/s are independent, and their distribution has a mean of 0. 


9.4 OBTAINING THE SAMPLE REGRESSION EQUATION 

The regression model of Equation 9.2.1 is not an equation for a straight line. It 
is a symbolic representation of a typical value of the dependent variable Y. Equa- 
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TABLE 9.4.1 
Production (X) and 
manufacturing 
expenses (V") for 10 
selected firms 


X {thousands of units) 40 42 48 55 65 79 88 100 120 140 

Y (thousands of dollars) 150 140 160 170 150 162 185 165 190 185 


tion 9.3.1, however, is an equation for a straight line. It is the line that describes 
the true relationship between X and /x v | A . The true position of this line is unknown 
because a and /3 are unknown. The objective of regression analysis is to estimate 
a and in order to make inferences about the true line of regression of Y on X. 

We can explain the procedures involved in regression analysis more easily by 
means of a numerical illustration. 


EXAMPLE 9.4.1 An operations analyst conducts a study to analyze the relationship 
between production and manufacturing expenses in the electronics industry. A 
sample of n = 10 firms, randomly selected from within the industry yields the 
data in Table 9.4.1. “Manufacturing expenses” is considered to be the dependent 
variable. It changes as the volume of production varies. On the other hand, a 
change in manufacturing expenses would not necessarily cause a change in volume 
of production. 

Note that X as well as Y is a random variable here, since we made no effort to 
collect sales figures only for firms with preselected values of the independent 
variable, production. In this example, we call the firm the unit of association. It 
is important that we preserve the pairwise identity of the measurements throughout 
the analysis. 

A good first step in a study of the relationship between two variables is to make 
a scatter diagram, a graph of the observed pairs of observations. We assign values 
of the independent variable X to the horizontal axis. We place a dot on the graph 
at the intersection of each pair of values of X and Y. Figure 9.4.1 shows the scatter 


FIGURE 9.4.1 
Scatter diagram for 
Example 9.4.1 
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diagram for these data. The pattern made by the points on the scatter diagram 
usually suggests the basic nature of the relationship between two variables. The 
points in Figure 9.4.1, for example, appear to be scattered around an invisible 
straight line. The scatter diagram also shows that, in general, firms with high 
production tend to have high manufacturing costs. These impressions suggest that 
the relationship between production and manufacturing expenses may be described 
by a straight line crossing the Y axis above the origin and making less than a 45- 
degree angle with the X axis. 

We could draw a freehand line through the data. The question is: Would this 
be the best possible line for describing the relationship that exists? It probably 
would not be. Any such freehand line would be subjective, and would reflect any 
defects in the vision or judgment of the person drawing it. We need some objective 
method of drawing a line that, by some criterion, we could call the best line to 
describe the relationship between the two variables. 

The Least-Squares 
Line 


y = a + bx ( 9 . 4 . 1 ) 

Here a is the point at which the line crosses the Y axis and b is the amount by 
which the line changes per unit change in x. We refer to a as the Y intercept and 
b as the slope of the line. To draw a straight line for the sample data, then, we 
need only numerical values for a and b. Once we have these values, we can 
substitute two different values of X into the equation and get corresponding values 
of Y. If we plot the resulting coordinates and (jt 2 ,y 2 ) on the graph and 

connect them, we have a straight line. 

Figure 9.4.2 is a graph of a straight line. Here we see the geometric relationships 
between the slope, the Y intercept, and a unit change in x. 

We can find numerical values for a and b for any set of data such as that in 
the present example by simultaneously solving the following two equations: 

2>’/ = na + b 

= a 2*, + b 2*, 2 

These equations, obtained by differential calculus, are called the normal equa¬ 
tions. Their solution yields the equation for the least-squares line describing the 
relationship between X and Y. The equation is of the form 

y = a + bx (9.4.4) 

where y denotes the calculated value of Y for a given X , and a and b are estimates 
of a and /3, respectively. 

Table 9.4.2 gives the values of Sy ( -, Xx,, Xxyy,, Xc?, and n, which are needed 
to solve the equations. Substituting values from Table 9.4.2 into Equations 9.4.2 
and 9.4.3 gives 


(9.4.2) 

(9.4.3) 


The objective method that we use here to find a line to describe the relationship 
between the variables is called the method of least squares. The line obtained by 
this method is called the least-squares line. 

We may write the equation for a straight line as 



FIGURE 9.4.2 
A linear regression 
equation 
illustrating the 
geometrical 
interpretations of a 
and h 



1657 = 10a + 111b , 132,938 = 111a + 70,9036 

We may solve these equations by any familiar method to get 

a = 134.79, b = 0.3978 

The following formulas for a and b are usually computationally more convenient: 
y , _ £-*,Sv, 

, _ '■* 1 n _ nlLxjVj — 2x,Sv,- 

6 = y2 W = «*? - W (9A5> 

M » 



n 



— y — bx 


(9.4.6) 


TABLE 9.4.2 

Intermediate _ 

X. 

y, 

xf 

*V 

v? 

computations for 

40 

150 

1,600 

6,000 

22,500 

normal equations, 

42 

140 

1,764 

5,880 

19,600 

Example 9.4.1 

48 

160 

2,304 

7.680 

25,600 

55 

170 

3,025 

9,350 

28,900 


65 

150 

4,225 

9,750 

22,500 


79 

162 

6,241 

12,798 

26,244 


88 

185 

7,744 

16,280 

34,225 


100 

165 

10,000 

16,500 

27,225 


120 

190 

14,400 

22,800 

36,100 


140 

185 

19,600 

25,900 

34,225 

Total 777 

1,657 

70,903 

132,938 

277,119 


T 



For the present example, we have 


b = 


132,938 - 


70,903 


(777X1657) 

10 

( 111) 2 

io - 


0.3978, 


a = 165.7 - 0.3987(77.7) - 134.72 

The two results for a do not agree exactly, due to rounding errors. 

The equation for the least-squares line that describes the relationship between 
production and manufacturing expenses is 

y = 134.79 + 0.3978* 

If we let* = 0, y = 134.79. And if* = 100, y = 174.57. These two points 
are sufficient for plotting the line, as we have done in Figure 9.4.3. This line is 
the sought-after “best” line for describing the relationship between the sample 
values of X and Y. Before we say by what criterion we judge it to be best, let us 
look at Figure 9.4.3. None of the points actually fall on the line that was drawn. 
That is, the points deviate from the line. It’s obvious that we can’t draw a straight 
line that will pass through all the points. Some deviation of points from any straight 
line is inevitable. The line drawn through the points, therefore, is best in this 
sense: 

The sum of the squared deviations of the observed data points (y,) from the 
least-squares line is smaller than the sum of the squared deviations of the data 
points from any other line that can be drawn through the data points. 


FIGURE 9.4.3 
Scatter diagram 
and least-squares 
line for Example 
9.4.1 



Suppose that we square the vertical distance from each observed point (y f ) to 
the least-squares line, and add these squared distances over all points. The total 
we get will be smaller than the similarly computed total for any other line that 
we could draw through the original points. This is why we call the line the least- 
squares line. 

If, in our sample regression equation, we set x equal to x, the mean of X, we 
find that y is equal to y, the mean of Y. Hence we see that the plotted line passes 
through the point (x, y). 

In these exercises: (a) plot the data as a scatter diagram, (b) obtain the least-squares 
regression equation, and (c) draw the regression line on the scatter diagram. 

9.4,1 A firm that sells office supplies wants to expand. The head of the firm wants to 
know what sales volume can be expected in various market areas. Regression analysis, 
with sales as the dependent variable, is suggested. It is decided that effective buying income 
would be the best independent variable. A sample of 15 trade areas in which the firm now 
does business gives the following results. 


Amount of sales (/)( x $100,000) Effective buying income (X)( x $1,000,000) 


0.5 


11 

2.3 


69 

9.4 


168 

1.1 


22 

2.9 


38 

2.5 


30 

3.0 


51 

3.4 


61 

5.8 


83 

6.1 


91 

6.8 


101 

6.9 


124 

7.2 


159 

11.4 


176 

14.3 


201 

2 x, = 1385 

2*,'= 83.6 

^ XjVi = 10,917.6 

2 v, 2 = 179,661 

2 y ( 2 = 681.32 


In the regression equation y = a 

+ bx, what do the values of a and b mean to the 

executive in the context of this problem? 

9.4.2 A research analyst is studying the relationship between shopping-center traffic and 

a department store’s 

daily sales. The analyst develops an index to measure the daily volume 

of traffic entering the shopping center, and an index of daily sales. The following table 
shows the index values for 10 randomly selected days. 

Traffic index (X) : 71 

82 111 

85 89 110 111 121 129 132 

Sales index (Y): 250 

280 301 

325 328 390 410 420 450 475 

2*< = 1041 

2 y< = 3629 

2*^ = 390,918 

2 *? = 112,359 

2_v? = 1,369,435 


In the regression equation y = a + bx, what do the values of a and b mean to the shopping 
center manager in the context of this problem? 






9.4.3 The following data show the daily wages (X) and amount of monthly rent payments 
(F) for a random sample of 15 unskilled workers who live alone. 


(VO, $ 

120 

130 135 

138 

142 149 

155 

158 

160 

169 170 

175 182 

190 195 

(X), $ 

34 

37 39 

42 

41 45 

40 

52 

50 

62 68 

65 70 

68 75 

2>/ 

- 788 

5>/ = 

2368 


00 

CM 

II 

,592 

2> 

? - 44,162 


380,858 


9.4.4 The following table shows the hardness (in Brinell hardness numbers) and the tensile 
strength (in thousands of pounds per square inch) of 10 specimens of a certain alloy. 


Hardness (X) 20 30 40 50 60 70 80 90 100 25 

Tensile strength {Y) 10 16 22 30 35 40 45 50 60 15 


9.5 EVALUATING THE SAMPLE REGRESSION EQUATION 

After we have determined the regression equation, we must evaluate it to find out 
whether it adequately describes the relationship between the two variables, and to 
see whether we can use it effectively for prediction and estimation. 

Partitioning the 
Total Sum of 
Squares 


One method of evaluating the regression equation is to compare the scatter of the 
points about the regression line with the scatter about y, the mean of the sample 
values of K. Figure 9.5.1 shows the regression line and the relative magnitudes 
of the scatter of the points from y for Example 9.4.1. It shows the line representing 


FIGURE 9.5.1 
Scatter diagram for 
Example 9.4.1, 
showing deviations 
about y and the 
regression line 
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FIGURE 9.5.2 
Total deviations 

(y, ~ 7) for 

Example 9.4.1 



y as a horizontal line. This is because, regardless of the value of X , y remains 
constant. For these data, the dispersion of the points about the regression line is 
much less than the dispersion about the y line. So it seems that the regression line 
provides a good fit for the data. 

We get the amount by which any observed value of Y, y h deviates from y by 
measuring the vertical distance between y t and y as shown in Figure 9.5.1. This 
difference (y,- - y) is called the total deviation. Consider, for example, the ninth 
value of Y. You will find it in Table 9.4.1 to be y 9 = 190. Since y = 165.7, the 
total deviation of this Y value is 190 - 165.7 = 24.3. Figure 9.5.2 shows the 
total deviation for each observation. 

The vertical distance from the regression line to the y line is given by (y — v). 
This is called the explained deviation. It shows the amount by which we reduce 
the total deviation when we fit the regression line to the points. For example, for 
y 9 = 190, y = 182.5. The explained deviation is y - y = 182.5 - 165.7 == 
16.8. Figure 9.5.3 shows the explained deviation for each observation. 

Finally, the vertical distance of the observed Y from the regression line (y t — 
y) is called the unexplained deviation. It represents that portion of the total de¬ 
viation not “explained” or accounted for by the fitting of the regression line. In 
the case of y 9 = 190, there is an unexplained deviation of y 9 — y = 190 — 

182.5 = 7.5. Figure 9.5.4 shows the unexplained deviation for each observation. 

Figure 9.5.1 shows the three deviations fory 9 . 

Thus the total deviation for a particular y,- is equal to the sum of the explained 
and unexplained deviations. That is, 

(>v - 50 = (y - y) + (j, - >>) t 9 - 5 - 1 * 

Total Explained Unexplained 

deviation deviation deviation 


r 

i 


FIGURE 9.5.3 
Explained 
deviations (y - y) 
for Example 9.4.1 
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In the case of y 9 = 190, we have 24.3 = 16.8 + 7.5. We can perform similar 
calculations for each y { . 

If we square each of the deviations in Equation 9.5.1 and sum for all obser¬ 
vations, we get three sums of squared deviations. Their relationship may be ex¬ 
pressed as follows: 

2(y, - ?) 2 = 2(y - >~) 2 + 2(>v - *) 2 (9 - 5 - 2) 

Total sum Explained sum Unexplained sum 

of squares of squares of squares 

Each of the terms in Equation 9.5.2 is a measure of dispersion. The total sum 

of squares measures the dispersion of the observed values of Y about their mean 
y. That is, this term is a measure of the total variation in the observed values of 
Y. It is the numerator of the familiar formula for the sample variance. 

The explained sum of squares is a measure of the total variability in the observed 
values of Y that is accounted for by the linear relationship between the observed 
values of X and Y. This quantity is sometimes referred to as the sum of squares 
due to linear regression. 

The unexplained sum of squares measures the dispersion of the observed Y 
values about the regression line. It is sometimes referred to as the sum of squares 
of deviations from linearity. The unexplained sum of squares is the quantity that 
we minimize when we find the least-squares line. It is usually called the error 
sum of squares. We may write Equation 9.5.2 in a more compact form, as follows: 

SST = SSR + SSE (9.5.3) 

where SST = total sum of squares 

SSR = sum of squares due to regression 
(explained sum of squares) 




FIGURE 9.5.4 
Unexplained 
deviations (y, - y) 
for Example 9.4.1 


“ ■ - 



SSE = error sum of squares (unexplained 
sum of squares) 


We can compute the total sum of squares by the following formula: 
SST = - yf = ^yf - 

n 

We can compute the explained sum of squares by 


ssr = Ecv - yf = b 1 2(x, - X) 2 = b 2 


2 -*? - 


(5>/) s 


( 9 . 5 . 4 ) 


( 9 . 5 . 5 ) 


We can get the unexplained sum of squares by subtraction. That is, 

SSE = SST - SSR 

From the data on production and manufacturing expenses, we may compute 
SST = 277,119 - - 6 ^ 7 - = 2554.10 


Alternatively, we may compute SST by squaring and summing the individual 
total deviations (y,- - y), as shown in Figure 9.5.2. When we do this, we have 

(- 15.7) 2 4- (— 25.7) 2 + • • • + (19.3) 2 = 246.49 + 660.49 + • • • 

+ 372.49 = 2554.10 

By Equation 9.5.5, the explained sum of squares, or sum of squares due to 
regression, is 


r 



Analysis of 
Variance 


TABLE 9.5.1 
ANOVA table for 
simple linear 
regression 


SSR = (0.3978) 2 

Or we can get the explained sum of squares by squaring and summing the ex¬ 
plained deviations (y — y), shown in Figure 9.5.3, to give 

SSR - (— 15) 2 4- (-14.2) 2 + • • • + (24.8) 2 

= 225.0 + 201.64 +••• + 615.04 = 1666.44 

The unexplained, or error, sum of squares, obtained by subtraction, is 
SSE = 2554.10 - 1666.33 = 887.77 

As an alternative, we can compute SSE by squaring and summing the individual 
unexplained deviations (y f — y), shown in Figure 9.5.4. Thus 

SSE = ( —0.7) 2 + (-11.5) 2 + • • • + ( —5.5) 2 
= 0.49 + 132.25 + • • • + 30.25 = 886.54 

Note a slight discrepancy due to rounding in the results for SSR and SSE computed 
by the two methods. 

When the assumptions we gave in Section 9.3 hold, we may use analysis of 
variance to test for the presence of regression. In this process, the total sum of 
squares 2(y,- — y) 2 is a measure of the total variability present in the data. The 
explained sum of squares 1(y - y) 2 is a measure of the variability due to linear 
regression. And the unexplained sum of squares 2(y,- - y) 2 is a measure of the 
variability left unexplained after regression has been considered. This last sum of 
squares is also called the deviations from regression or error sum of squares. We 
can also subdivide the total degrees of freedom (n — 1) into two components, 1 
for regression and (n - 1) - 1 = (n — 2) associated with the error sum of 
squares. Dividing the sums of squares by their associated degrees of freedom 
yields corresponding mean squares. If there is no linear regression (that is, if 
= 0), and if the stated assumptions about the model apply, the ratio of the 
regression mean square to the error mean square is distributed as F with 1 and 
(n - 2) degrees of freedom. 

We can, therefore, test the null hypothesis that f3 = 0 using analysis of variance. 
Table 9.5.1 shows the analysis-of-variance table that we can construct. 


70,903 - 


(777) : 

10 


= 1666.33 


Source of 

variation SS df MS F 


Linear 

regression 

SSR 

1 

MSR = 

SSR/1 

MSR/MSE 

Deviation from 
linearity (error) 

SSE 

n - 2 

MSE = 

SSE j(n - 2) 


Total 

SST 

n - 1 




TABLE 9.5.2 
Analysis of 
variance for 
Example 9.4.1 


Another 
Hypothesis Test 
About (3 


Source 

SS 

df 

MS 

F 

Regression 

1,666.33 

1 

1,666.33 

15.02 

Error 

887.77 

8 

110.97 


Total 

2,554.10 

9 




For the data on production and manufacturing expenses, let us test 
H 0 : there is no linear regression between X and Y (ft = 0) 

against 

H ] : there is a linear regression of Y on X (/3 0) 

at the 0.01 level of significance. Table 9.5.2 shows the appropriate analysis of 
variance. The computed value of F = 15.02 is significant at the 0.01 level. Thus 
we may conclude that the data of this sample provide sufficient evidence of the 
presence of regression. Since 15.02 > 14.69, we have, for this test, p < 0.005. 

When we can’t reject H 0 : p = 0, we can’t be certain that X and Y are unrelated. 
Aside from the fact that we may have committed a Type II error, we must be 
aware that, although they are perhaps not linearly related, X and Y may have a 
nonlinear relationship. Even when we can reject H 0 : p = 0, we can’t be certain 
that the strongest form of relationship between X and Y is a linear one. The two 
variables may be more strongly related in a nonlinear way, although a linear model 
gives a satisfactory approximation to the true relationship. Of course, a rejected 
null hypothesis that /3 = 0 may very well indicate that there is a true linear 
relationship between X and Y. 

An alternative way to evaluate the sample regression equation is to use /?, the 
slope of the sample line, as a basis for testing the null hypothesis of no regression. 

When the assumptions in Section 9.3 are met, a and b are unbiased point 
estimators, respectively, of a and p. When, under these assumptions, the sub¬ 
populations of Y values are normally distributed, the sampling distributions of a 
and b are each normal, with means and variances as follows: 


p a = a 

( 9 . 5 . 6 ) 

2 _ 

° nL(Xj — x ) 2 

( 9 . 5 . 7 ) 

il 

:£ 

( 9 . 5 . 8 ) 

tj2 - _ £ak _ 

* 2(x, - x ) 2 

( 9 . 5 . 9 ) 


In Equations 9.5.7 and 9.5.9, cr 2 |_ t . is the variance about the population regression 
line. We also call cr 2 | v the unexplained variance of the population. It is the common 
variance cr 2 of the subpopulations of Y as specified in the initial assumptions. The 
definitional equation for this quantity, for a finite population of size N , is: 



( 9 . 5 . 10 ) 



When the assumptions are met, then, we can construct confidence intervals for, 
and test hypotheses about, a and /3 in the usual way. In most cases, inferences 
about a are not of great interest. The parameter /3, however, is of great interest. 
If /3 = 0, the regression line is horizontal, and an increase or decrease in X is 
not associated with a change in Y. In this situation, we conclude that X and Y are 
not linearly related. A positive indicates that, generally, Y tends to increase as 
X increases. In this situation, there is a direct linear relationship between X and 
Y. A negative /3 indicates that values of Y tend to decrease as values of X increase, 
and there is an inverse linear relationship between X and Y. Figure 9.5.5 illustrates 
these three situations. 

We want to determine whether the sample data provide sufficient evidence to 
indicate that jS is different from 0. Suppose that we can reject the null hypothesis 
that j6 = 0. Then we can conclude that /3 is not equal to 0, and therefore that 
there is a linear relationship between X and Y. Whether this suggested linear 
relationship is presumed to be direct or inverse depends on the sign of b , the 
estimate of /3. 

The test statistic, when <r^ x is known, is 


b ~ A) 
o- b 


( 9 . 5 . 11 ) 


In the usual case, cr^ x is unknown and the test statistic is 


b ~ ft, 
s b 


( 9 . 5 . 12 ) 


where s b is the estimator of a b . The associated degrees of freedom are n — 2, 
the error degrees of freedom from the ANOVA table. 

To find s b , we must first estimate cr^ x . An unbiased estimator of this is given 
by 


FIGURE 9.5.5 
Scatter diagrams 
showing different 
types of linear 
relationships 



(a) Direct linear relationship (b) Inverse linear relationship (c) No linear relationship 


S(v, - y) : 

n — 2 


An alternative formula for slw is 


n — 2 


^x i y i - 


(S^XSjy,.) 


- 


n — 2 


( 2^) 2 | - ( 2 * # )( 2 y f )] 


The estimator, s^ x , is the same as the error mean square appearing in the analysis- 
of-variance table. An unbiased estimator of crl , then, is 


2 (*,- - *) 2 


The following formula takes less work: 


* Xxj - (Xx^/n 

Let us now use the example of production and manufacturing expenses (Ex¬ 
ample 9.4.1) to show how to test the null hypothesis that /3 = 0. First we state 
the hypotheses and significance level: 

H 0 : 0 = 0, Hi. P* 0 

Let a = 0.05. We next obtain sl\ x , which, from Table 9.5.2, is 


sL = MSE = 110.97 


We may now compute 


110 97 _ 

si = ——- „ = 0.0105 and = VaOK)5 = 0.102 

b 70,903 - (777) 2 /10 * 

The figures in the denominator of si come from Table 9.4.2. 

The test statistic that we may compute is 

= 03978,- 0 = 3 
0.102 

We reject H 0 , since 3.9 > 2.306, the upper critical value of t for a two-sided 
test with 8 degrees of freedom and a = 0.05. Thus we conclude that 0 is not 0 
and that there is a linear relationship between X and Y . Since b is positive, we 



A Confidence 
Interval for ft 


conclude that the relationship is direct, not inverse. Since 3.9 > 3.3554, p < 
2(0.005) = 0.01. 

Note that the decision resulting from testing H 0 : ft = 0 by means of the t test 
is the same as that reached using analysis of variance. In fact, the value of t 
computed from Equation 9.5.12 is equal to the square root of the F computed in 
the analysis of variance. (In practice, small differences may occur because of 
rounding.) 

We can use Equation 9.5.12 to test the null hypothesis that ft is equal to some 
value other than 0. The hypothesized value for ft, ft 0 , replaces 0 in the equation. 
All other quantities, computations, degrees of freedom, and methods of deter¬ 
mining significance are the same as in the example. 

Alternatively, we can test the null hypothesis that ft = 0 by means of a confidence 
interval for ft. We use the general formula for a confidence interval, 

Estimate ± (reliability factor) x (standard error) 

When we construct a confidence interval for ft , the estimator is b. The reliability 
factor is some value of z or t (depending on whether or not cr 2 ^ is known). And 
the standard error of the estimator is 


- / vl-x- 

^ V $(*, - V ) 2 

When cr 2 i is unknown, we estimate <r b by 



Thus in most practical cases, the 100(1 - a)% confidence interval for ft is given 
by 

b ± t\ ~ a/2 S b (9.5.17) 

If the confidence interval that we construct includes 0, we conclude that 0 is a 
candidate for ft. Therefore we cannot rule out the possibility that ft is 0. This 
conclusion corresponds to the statistical decision of failing to reject H 0 : ft = 0. 
If, on the other hand, the interval does not contain 0, we reject the null hypothesis 
that ft — 0. We conclude that X and Y are linearly related. The strength of this 
conclusion is related to the confidence coefficient selected in constructing the 
interval. 

Let us construct a 95% confidence interval for ft, using the data from Example 
9.4.1. We can construct the following 95% confidence interval using Expression 
9.5.17: 


0.3978 ± 2.306(0.102) 

0.3978 ± 0.2352, 0.1626, 0.6330 




We interpret this interval in the usual way. From the probabilistic point of view, 
we say that if we were to draw samples of size 10 repeatedly from the population 
and compute a confidence interval for /3, 95% of these intervals would, in the 
long run, include the parameter /3. From a practical standpoint, we say that we 
are 95% confident that the single interval that we have constructed includes /3. 

The interval 0.1626 to 0.6330 does not include 0. We therefore conclude that 
f3 is not 0 and that there is a linear relationship between X and Y. This is the same 
conclusion that we reached by means of the hypothesis tests described earlier. 
The three inferential procedures always lead to the same conclusion. 


The Coefficient of 
Determination 


Another measure of how well the least-squares line fits the observed data is the 
coefficient of determination. If you interpret this descriptive measure properly, it 
helps you decide whether the regression equation you’ve obtained is likely to be 
useful for prediction and estimation. 

Let us define the sample coefficient of determination as 


r 


2 


S(y ~ v ) 2 

2 (y, - JO 2 


( 9 . 5 . 18 ) 


A useful computing formula for r 2 is given by 

2 _ ~ *) 2 _ b 2 {Xxj - (Zx,) 2 /n] 

r 2(y, - y) 2 2yf - (Xy,) 2 /n 


In words, we say that the coefficient of determination is equal to the ratio of the 
explained sum of squares to the total sum of squares. As such, it indicates what 
proportion of the total variation in Y is explained by the regression of Y on X. 
Thus, since we compute r 2 from sample data, it measures a characteristic of these 
sample data. And it is not the measure of that characteristic for the total population 
of data. The population counterpart of r 2 is usually designated by p 2 . Thus we 
use r 2 to estimate p 2 , the population coefficient of determination. We define p 2 
the same way we define r 2 for a sample. That is, 

2 _ S(y ~ My) 2 

P Ky, ~ Vy) 2 

We can interpret the sample coefficient of determination in the following ways. 

1. We may interpret r 2 as a measure of the closeness of fit of the regression 
equation to the sample data . The better the fit of the computed regression line, 
the closer r 2 will be to 1. In other words, if the regression line provides a perfect 
fit, the total variation in Y is completely explained, and r 2 is exactly equal to 1. 
If, in Equation 9.5.2, the unexplained sum of squares is 0, the total sum of squares 
and the explained sum of squares are equal. Figure 9.5.6(a) shows this situation. 
On the other hand, if the regression line is very close to the y line, it will explain 
a small proportion of the total variation in Y, and r 2 will approach 0. Figure 
9.5.6(b) illustrates this concept. 


r 



FIGURE 9.5.6 
Scatter diagrams 
illustrating 
different values 
of r 2 



2. We may also think of r 2 as a measure of the relative reduction in the total 
sum of squares achieved by fitting a regression line . As we have implied, the 
relative reduction may be 0, 1, or any amount in between. 

3. Finally, we may interpret r 2 as a measure of the linearity of the data points. 
When the regression line fits the data well, the data points are such that their 
scatter diagram gives the impression of a straight line. On the other hand, when 
the fit is not good, the points are so widely scattered that the diagram does not 
suggest a straight line. Figure 9.5.7(a) and (b) illustrate this. This interpretation 
requires that the points have a distribution and that b 0. When all values of Y 
are the same, b = 0, y is a constant, the variables X and Y are unrelated, and r 2 
is zero. In other words, if y t = y for all y h then 2(y, - y) 2 , the denominator of 
the formula for r 2 , is equal to 0, and r 2 has no meaning. Figure 9.5.7(c) illustrates 
this. 


Let us illustrate the calculation of r 2 using the data on production and manu¬ 
facturing expenses in Example 9.4.1. Table 9.4.2 gives the needed preliminary 
calculations. By Equation 9.5.19, we compute 

2 (0.3978) 2 [70,903 - (777) 2 /10] 

r “ 277,119 - (1657) 2 /10 

Thus the regression of Y on X explains 65% of the total variability in Y. 

The sample coefficient of determination provides a point estimator of p 2 , the 
population coefficient of determination. When the number of degrees of freedom 
is small, however, r 2 is positively biased. An unbiased estimator is provided by 


S(y, - y) 2 /(n - 2) 
£(» - y) 2 /(n - l) 


0.65 


(9.5.20) 





The numerator of this fraction is the unexplained mean square, and the denomi¬ 
nator is the total mean square. Thus the difference between r 2 and r 2 is due to 
the factor (n - 1)/(/? - 2). When n is large, this factor approaches 1 and the 
difference between r 2 and f 2 approaches 0. 

For the example, we may compute 


f 2 


(887.77)/8 
(2554.10)/9 


In this example, the difference between r 2 and r 2 is small. 


Residual Plots As noted in Section 9.3, for the valid use of inferential procedures in regression 

analysis, certain assumptions about the sampled population must be met. We may 
state these assumptions in terms of e ( of Equation 9.2.1. We refer to the e , com¬ 
ponent of the model as the error term. The calculated residuals are estimates of 
the error term in the model. The residuals in a given application are the unex¬ 
plained components of the individual total deviations. That is, e, = y t - y. Figure 
9.5.4 graphs the residuals for Example 9.4.1. 

If, then, we wish to determine whether or not the data satisfy the assumptions, 
we focus on the residuals. Succeeding chapters will present statistical tests that 
you can use to test the following assumptions: 


FIGURE 9.5.7 
Scatter diagrams 
illustrating 
different degrees 
of closeness of fit 
of observed values 
of Y to a 
regression line 


1. The population e, are normally distributed with a mean of 0 

2. The subpopulations of Y values for given X values all have the same vari¬ 
ance, cr 2 

3. The e, are independent 

We may, however, use a simple technique to help us decide whether or not it 
appears likely that the assumptions are violated. The technique consists of plotting 



(a) Good fit, scatter diagram 
suggests straight line 


(b) Poor fit, scatter diagram 
does not suggest straight 
line 


(c) All values of /are equal; 
b = 0; r 2 has no meaning 
and there is ho observed 
linear relationship be¬ 
tween X and Y 




the residuals. Draper and Smith (1981), who discuss the method in detail, point 
out that a study of residual plots is very revealing when the assumptions are 
violated. See also Neter and Wasserman (1974). We may plot the residuals in 
many ways. One way is to plot them against the independent variable X. That is, 
we plot values of X on the horizontal axis and the residuals on the vertical axis. 

If a scatter diagram of sample residuals has a pattern that suggests a horizontal 
band centered on 0, this is taken as a lack of evidence that the assumptions are 
violated. Figure 9.5.8(a) shows a scatter diagram that is compatible with the 
assumptions. You must realize that we are not saying that such a scatter diagram 
indicates that the assumptions are met. Rather, we are saying that this particular 
plot does not indicate that they are violated. 

A scatter diagram that conforms to the pattern in Figure 9.5.8(b) suggests that 
the variances in the subpopulations are not all equal. This scatter diagram suggests 
that cr 2 ^ increases as X increases. When the scatter diagram resembles Figure 
9.5.8(c), this indicates that the relationship between X and Y is curvilinear rather 
than linear. 

When error terms are correlated, the correlation is often due to a dependence 
between successive values. This type of correlation is called autocorrelation or 
serial correlation. If we suspect that the assumptions are violated because of 
autocorrelation, a plot of the residuals against time may prove helpful. A scatter 
diagram resembling that of Figure 9.5.8(d) suggests the presence of autocorrelated 
error terms. 

Suppose that when we examine the residuals, we doubt that our assumptions 


FIGURE 9.5.8 
Some typical 
residual plots 



Exercises 


FIGURE 9.5.9 
Residual plot for 
Example 9.4.1 


apply to the data at hand. Must we then discontinue the analysis? Not necessarily. 
Moderate departures from the assumption of normality do not invalidate the re¬ 
sults, since the inferential procedures are not overly sensitive to violations of that 
assumption. Violations of the assumptions of equal variances and/or correlated 
error terms usually call for some type of remedial action. Usually a proper trans¬ 
formation of the dependent variable Y solves the problems introduced by these 
violations. A discussion of transformations is beyond the scope of this book. [If 
you are interested, see Neter and Wasserman (1974), who discuss the technique.] 
To illustrate a plot of sample residuals, let us refer to the residuals of Example 
9.4.1, which are shown in Figure 9.5.4. When we plot these residuals, we have 
the scatter diagram shown in Figure 9.5.9. This scatter diagram suggests that the 
subpopulation variances may not be equal, but tend to decrease as X increases. 

9.5.1 Refer to Exercise 9.4.1. (a) Compute the coefficient of determination, (b) Prepare 
an ANOVA table and test the null hypothesis of no linear relationship between the two 
variables, (c) Test the null hypothesis that (3 = 0 using a 0.05 level of significance, (d) 
State your conclusions in terms of the problem, (e) Construct the 95% confidence interval 
for (3 . (f) Determine the p value for each test, (g) Arrange the residuals in the order of 
their corresponding X values and sketch the relationship. 

9.5.2 Repeat steps (a) through (f) of Exercise 9.5.1, using the data of Exercise 9.4.2. 




Assume that the sample values were collected in the sequence 1,2, 3, 10 (time = 

1, 2, 3, . . 10) and sketch the residuals accordingly, with time as the independent 

variable. 

9.5.3 Repeat steps (a) through (f) of Exercise 9.5.1 using the data of Exercise 9.4.3. 

9.5.4 Repeat steps (a) through (f) of Exercise 9.5.1 using the data of Exercise 9.4.4. 


9.6 USING THE SAMPLE REGRESSION EQUATION 


Once we have decided that the data at hand provide sufficient evidence to indicate 
a linear relationship between X and Y, we can use the sample regression equation. 
We can use it in two ways. First, we can use it to predict what value Y is likely 
to assume for a given value of X. When the assumptions of Section 9.3 are met, 
we can construct a prediction interval for Y. Second, we can use it to estimate 
the mean of the subpopulation of Y values for a particular value of X. Again, if 
the assumptions of Section 9.3 are met, we can construct a confidence interval 
for the mean. 

When we use a regression equation to make an inference about the Y value of 
a single subject (or other entity), given the subject’s X value, we call the procedure 
predicting. For example, suppose that we have a regression equation that describes 
the relationship between the grade-point averages (A) of college students and their 
scores on the Scholastic Aptitude Test (F). We may wish to use the equation to 
predict the grade-point average of a given student about to enter college. 

When we use a regression equation to make an inference about the mean Y 
score of a population of subjects all of whom have the same X value, we call the 
procedure estimating. We may, for example, want to use the regression equation 
describing the relationship between college students’ grade-point averages and 
their SAT scores to estimate the mean grade-point average of a population of 
college-bound students all of whom have the same SAT score. 

For any given value of X, the predicted value of Y and the point estimate of 
the mean of the subpopulation of Y are numerically the same. However, the two 
intervals are not of the same width. This seems reasonable, since the estimate of 
the mean ought to be subject to less variation than the estimate of a single value. 


Predicting Y for a 
Given X 


We get a point prediction of the value Y is likely to assume for a given X by 
substituting a particular value of X, x p , into the sample regression equation and 
solving for y. If the assumptions of Section 9.3 are met, and if <x 2 |_ t is unknown, 
the 100(1 — a)% prediction interval for Y is given by 


9 ± h-anSfr 


A + 1 + ~ *> 2 

yj n 2(x,- - x) 2 


( 9 . 6 . 1 ) 


We can evaluate the denominator, X(x,- - x) 2 , by means of the formula 


2* 2 - 


(Sjc f ) 2 

n 


The degrees of freedom used in selecting t are n — 2. 


Estimating the 
Mean of Y for a 
Given X 


Suppose that, in Example 9.4.1, we wish to predict the manufacturing expenses 
for a firm that produces 50,000 units. Substituting 50 for x in the sample regression 
equation gives 

>5 = 134.79 + 0.3978(50) = 155 

Using Expression 9.6.1 and the data from Tables 9.4.2 and 9.5.2, we construct 
the following 95% prediction interval: 

__ / l ( 5 Q _ 77 7)2 

>,55 ± 2 306<V110.97,^, t ^ + ^ . 177 ^ /[0 
$155 ± $26, $129, $181 

Interpreting a prediction interval is like interpreting a confidence interval. If we 
repeatedly draw samples, do a regression analysis, and construct prediction inter¬ 
vals for firms that produce 50,000 units, 95% of the intervals will include the 
manufacturing expenses. This is the probabilistic interpretation. The practical in¬ 
terpretation is that we are 95% confident that the single prediction interval con¬ 
structed includes the true manufacturing expenses. 


To estimate the mean fx y \ x of a subpopulation of Y values for a certain value of 
X, x p , we substitute x p into the sample regression equation and solve for y. 

The 100(1 — a)% confidence interval for when a^ x is unknown and the 
assumptions of Section 9.3 are met, is given by 


* 1 - a/2 S y\x 


■Z\2 


(Xp ~ x) 
2 (* f - “ x) 2 


( 9 . 6 . 2 ) 


Suppose that, for the example of the production and manufacturing expenses, 
we wish to estimate the mean of the subpopulation of Y values for firms that 
produce 50,000 units. We obtain the point estimate as follows: 

y = 134.79 + 0.3978(50) = 155 

Using Expression 9.6.2, we obtain the 95% confidence interval for fx y | v : 

>155 ± 2.306(VTkT97) + 7( , ^ I ^ /|0 

$155 ± $10, $145, $165 

If we repeatedly drew samples of size 10 from the population, performed a 
regression analysis, and constructed confidence intervals for fi y \ x forX — 50, 95% 
of such intervals would include the true mean. Thus we are 95% confident that 
the single interval constructed contains the true mean. 

Suppose that we construct confidence intervals for several subpopulation means 
and plot the upper and lower limits on the same scatter diagram with the regression 
line. We may construct a confidence band by connecting all the upper limits with 
one curve and all the lower limits with another curve. Table 9.6.1 gives the upper 
and lower 95% confidence limits for /jl v \ x for selected values of X in the example 




TABLE 9.6.1 
95% confidence 
limits, selected 
values of X, 
Example 9.4.1 


Use of Computers 


FIGURE 9.6.1 
Regression line and 
95% confidence 
band for Example 
9.4.1 


X 

40.0 

50.0 

60.0 

77.7 

100.0 

120.0 

140.0 

Lower limit 

139 

145 

150 

158 

166 

170 

173 

Upper limit 

163 

165 

168 

174 

184 

196 

207 


of production versus manufacturing expenses. Figure 9.6.1 shows the 95% con¬ 
fidence band that results when we plot these values. 

Note that the confidence band of Figure 9.6.1 is wider at the ends than in the 
middle. In fact, the band is narrowest for x p = x, since the quantity under the 
radical of Expression 9.6.2 is smallest when we use the mean of the X values as 
the particular value of X. As x p increases or decreases, the quantity under the 
radical becomes larger and the corresponding intervals become wider. We can 
construct prediction bands, using prediction intervals, in a similar manner. 

[Graybill and Bowden (1967) present a method for constructing straight-line 
confidence bands. They suggest that bands of this type may be preferable to 
curvilinear bands because they are easier to compute and graph, and have a smaller 
average width. Dunn (1968) constructed a confidence band that she says is pref¬ 
erable to that of Graybill and Bowden. She suggests that the usual curvilinear 
confidence band is to be preferred over any of the straight-line variety. Hoel 
(1951), Gafarian (1964), Halperin and Gurian (1968), and Elston and Grizzle 
(1962) also discuss this topic.] 

The computations needed for carrying out a complete regression analysis can take 
time, especially if there are many observations and if the numbers are large or 



involve many decimal places. But this is not a major problem for those who have 
access to a computer. Even if you don’t write your own computer programs, there 
are many “canned” programs available that will perform all the calculations you 
need for a complete regression analysis. The printed output from these programs 
includes such calculated measures as numerical values for a, b, r 2 , x, y, Ex, Ey; 
explained, unexplained, and total sums of squares; confidence intervals for a and 
j 8; and predicted values of Y. When you have a computer to provide such output, 
you can concentrate on improving the quality of the raw data and interpreting the 
output, rather than spending hours on tedious calculations. 

Figure 9.6.2 shows the printout of a computer program that performs a simple 
linear regression analysis. The data of Example 9.4.1 were entered manually on 
a remote teletype terminal and transmitted over a telephone line to the computer. 
When we compare the results given on the computer printout, we note some 
differences due to rounding errors. On the printout, the dependent variable Y and 


FIGURE 9.6.2 


M U L T I V A 

R I A T E 

C U R V E F I T 

Computer printout 
for analysis of data 

VARIABLE 

REGR C0EFF 

MEAN VALUE 

STD DEV 

for Example 9.4.1 

1 (CONSTANT 

= 134.788 ) 

185.7 

15,9832 


z 

.397842 

77,8999 

32,4503 


INDEX 

OF DETERMINATION 

(R-SQ) = ,852558 


CORRELATION 

MATRIX 




1 

.80773 




* B07731 

1 




DO YOU WISH 

TABLE OF VALUES 

PR INTED7YES 




ACTUAL 

VS C A 

LCULATED 


ACTUAL 

CALCULATED 

RESIDUAL 

PCT RESIDUAL 


150 

150.701 

-.701283 

- ,4 


140 

151.497 

-11,4989 

-7,5 


ISO 

153,884 

8,118 

3.9 


170 

158,6G9 

13,3311 

8,5 


150 

180,847 

-10.8473 

- 8,8 


1G2 

188,217 

-4,2171 

-2 , 5 


185 

189,798 

15,2023 

8,9 


1 85 

174.572 

-9.57178 

-5,4 


190 

182,529 

7,47139 

4 


185 

190,485 

-5,48548 

-2,8 



A N A L Y S I 

S OF 

VARIANCE 


SOURCE 

SS 

DF 

MS 


REGRESSION 

1887,04 

1 

1887,04 


ERROR 

887,584 

8 

110,948 


TOTAL 

2554,83 

9 



♦ ♦ SIGNIFICANT AT 1% LEVEL 




VARIABLE 

C0EFFIC TENT 

STD ERROR 

T STATISTIC 


2 

,397842 

, 102848 

3,87585 



the independent variable X are referred to as variables 1 and 2, respectively. The 
entries in the “Residual” column are the sample residuals. 


Exercises 


In each of these exercises, construct (a) the 95% confidence interval, and (b) the 95% 
prediction interval using the indicated value of X to obtain y. 

9.6.1 Refer to Exercise 9.4.1 and let X = 50. 

9.6.2 Refer to Exercise 9.4.2 and let X = 100. 

9.6.3 Refer to Exercise 9.4.3 and let X = 60. 

9.6.4 Refer to Exercise 9.4.4 and letX = 75. 

9.6.5 Construct the 95% confidence band for Exercise 9.4.4. 


9.7 THE LINEAR CORRELATION MODEL 

The classic regression model requires that only Y be a random variable. The 
variable X under this model is nonrandom. We have pointed out, however, that 
we can perform a regression analysis when X is random. For the correlation model, 
which is the model we assume when we want to obtain a measure of the strength 
of the relationship between two variables, both X and Y must be random variables. 

When we choose correlation analysis, we obtain sample observations by se¬ 
lecting a random sample of units of association and taking on each a measurement 
of both X and Y. In this procedure, values of X are not selected in advance but 
occur at random, depending on the unit of association randomly selected in the 
sample. 

In regression analysis, you will recall, one variable is referred to as the de¬ 
pendent variable and the other the independent variable. Correlation analysis in¬ 
volving two variables implies that both variables have equal status. It does not 
distinguish between them on the basis of dependence and independence. In fact, 
we can fit a straight line to the data by minimizing either 2(y f: - y ) 2 or 2(x - 
x) 2 . In other words, we may do a regression of X on Y or Y on X. The fitted lines 
in the two cases, in general, will be different. Thus a logical question arises as 
to which line to fit. 

If our objective is to measure the strength of the relationship between the two 
variables only, it doesn’t matter which line we fit, since the computed measure 
of correlation will be the same in either case. If, however, we wish to use the 
equation describing the relationship between the two variables for estimation and 
prediction, which line we fit does make a difference. We should treat the variable 
for which we wish to estimate means or make predictions as the dependent vari¬ 
able. That is, we should regress this variable on the other variable. 

Figure 9.7.1 shows a scatter diagram of sample data from a bivariate population 
and the two regression lines that we can obtain from the data. 

Under the correlation model, X and Y vary together in a joint distribution. If 
the form of the joint distribution of X and Y is normal, the joint distribution is 
called a bivariate normal distribution. When we sample from a bivariate normal 
distribution, we can make inferences about the population on the basis of our 
analysis of the sample. If, on the other hand, the joint distribution of the population 


FIGURE 9.7.1 
Scatter diagram of 
sample data from a 
bivariate 
distribution, 
showing regression 
lines of Y on X and 
X on Y 


FIGURE 9.7.2 
Three views of a 
bivariate normal 
distribution 


is not normal, we cannot make inferences about it. However, we can compute 
descriptive measures of correlation for the sample. 

When sampling is from a bivariate distribution, the following assumptions are 
necessary if inferences about the population are to be valid. 

1. For each value of X there is a normally distributed subpopulation of Y values. 

2. For each value of Y there is a normally distributed subpopulation of X values. 

3. The joint distribution of X and Y is a normal distribution called the bivariate 
normal distribution. 

Figure 9.7.2 is a graph of the bivariate normal distribution. It shows that if we 
slice the mound parallel to Y at some value of X , we will see the correspond¬ 
ing normal distribution of Y. Similarly, a slice through the mound parallel to 
X at some value of Y reveals the corresponding normally distributed subpopula¬ 
tion of X. 




(b) Cut-away showing normally 
distributed subpopulation of 
/for given X 


(c) Cut-away Showing normally 
distributed subpopuiation 
of X for given / 


(a) A bivariate normal distribution 




9.8 THE CORRELATION COEFFICIENT 


The bivariate normal distribution discussed in Section 9.7 has five parameters; cr v , 
cr y , fx x , fx y , and p. The first four of these are, respectively, the standard deviations 
and means associated with the individual distributions. The other parameter, p, is 
called the population correlation coefficient. It measures the strength of the linear 
relationship between X and Y. 

The population correlation coefficient is the square root of p 2 , the population 
coefficient of determination, which we discussed in Section 9.5. Since p 2 can 
assume values between 0 and 1, inclusive, p can take on values between - 1 and 
+ 1, inclusive. When p = + 1, there is perfect direct linear correlation between 
X and Y. When p = — 1, there is perfect inverse linear correlation between them. 
A p of 0 indicates that X and Y are not linearly correlated. 

The sign of p is always the same as the sign of /3, the slope of the regression 
line for X and Y. The sample correlation coefficient r, the square root of the 
sample coefficient of determination, measures the strength of the relationship 
between the sample observations on two variables in the same way that p describes 
the relationship in a population. The scatter diagrams in Figure 9.8.1 represent 
situations in which r is approaching 0,-1, and + 1, respectively. 

We usually want to know whether a set of sample data provides sufficient 
evidence to indicate that p^O. If we can reject the null hypothesis that p = 0, 
we can conclude that there is a linear relationship between X and Y. 

The hypothesis-testing procedure consists of the following steps: 

1. Obtain b by solving Equations 9.4.2 and 9.4.3 or Equation 9.4.5. 

2. Compute r 2 by Equation 9.5.19. 

3. Compute the following test statistic: 



This is distributed as Student’s t distribution with n - 2 degrees of freedom when 
p = 0 and the distribution of A' and Y is bivariate normal. In using Equation 
9.8.1, be careful to give r the appropriate sign, which is the same as the sign of 
the sample slope b. 


FIGURE 9.8.1 
Scatter diagrams 
illustrating 
different values of 
the sample 
correlation 
coefficient 



r- -«-0 • • ; r -—1 ' " r -^ + 1 ; 

(a) No linear relationship (b) Inverse linear relationship (c) Direct linear relationship 




An alternative formula for r is 


nXx,y, - (2x,)(2y,) 



When we use this formula, we do not need to compute b first. Equation 9.8.2 is 
usually better when we do not need the regression equation. When we use Equation 
9.8.2, the value of r that results will have the correct sign. 

When f3 0 = 0, the t value computed by Equation 9.8.1 is identical to the t 
computed by Equation 9.5.12. 

We can illustrate these procedures by an example. 


EXAMPLE 9.8.1 A study is made of the relationship between annual sales volume 
and size of shopping centers. Table 9.8.1 shows data on a sample of 10 shopping 
centers. Here X denotes thousands of square feet of building space and Y denotes 
volume of sales in millions of dollars. Can we conclude on the basis of these data 
that X and Y are linearly correlated? 

The appropriate hypotheses are 

H 0 : p = 0, H x :p¥> 0 

Let a = 0.05. To test this hypothesis, we proceed as follows. 

1. The normal equations for these data are 

99.3 = 10a + 169 6b and 31,749 = 1696a + 596,8406 

When these equations are solved, they yield 

b = 0.048 


2. Using Equation 9.5.19, we compute 

2 _ (0.048) 2 [596,840 - (1696) 2 /10] 
r " 1822.79 - (99.3) 2 /10 

and we wind up with 

r = V0.8514 = 0.9227 


= 0.8514 


When we compute r by the alternative formula using the data of this example, 
we have 


_ 10(31,749) - (1696)(99.3) 

V 10(596,840) - (1696) 2 V 10(1822.79) - (99.3)' 


= 0.9268 


Except for rounding error, this is the value we found using Equation 9.5.19. 


TABLE 9.8.1 
Sales volume and 


size of shopping 

X 

40 600 

60 

72 

400 90 

200 70 

80 

84 Total: 1696 

center, selected 

Y 

3.5 25.0 

4.8 

3.5 

30.0 5.0 

12.0 4.5 

5.0 

6.0 Total: 99.3 

shopping centers 

2X? : 

- 596,840 

- 

1822.79 

2x,y,- = 

31,749 




for a given year 




3. The test statistic is 


Fisher's Z 


' = °' 9227 = 67701 

Since 6.7701 > 2.306, the critical value of t for 8 degrees of freedom and 
a — 0.05 (two-sided test), we reject H 0 . We conclude that X and Y are linearly 
related; p < 0.010. Since F with 1 and n — 2 degrees of freedom is equal to t 2 
with n — 2 degrees of freedom, we can carry out an alternative test of H 0 : p = 
0. We use as the test statistic 



This is the F we found in the analysis of variance in Table 9.5.1. For this example, 
we have 


F = (6.7701) 2 


0.8514(8) 

1 - 0.8514 


45.8358 


When p is not close to 0, it may not be appropriate to use the normal distribution 
to approximate the sampling distribution of r. Therefore we shall use the t statistic 
of Equation 9.8.1 only for testing H 0 : p - 0. If we want to test H 0 : p = p 0 , 
where p 0 is some value other than 0, we should use another approach. Fisher 
(1921) has suggested the following procedure. First we transform r to z r as follows: 

1 1 + r 

z r = - In —- (9.8.3) 

where In is a natural logarithm. It can be shown that z r is approximately normally 
distributed with a mean of 


z 


p 



1 + P 

1 - p 


and estimated standard deviation of 

fr z 


_ 1 _ 

Vn - 3 


(9.8.4) 


To test the null hypothesis that p is equal to some value other than 0, the test 
statistic is as follows (where z p0 is the value from Appendix Table I for the 
hypothesized value of p): 


Z = 



(9.8.5) 


The statistic Z follows approximately the standard normal distribution. 

To determine z r for an observed r and z p for a hypothesized p, we can consult 
Table I to avoid the direct use of natural logarithms. 


Suppose, for Example 9.8.1, that we wish to test 

H 0 : p - 0.95 against Hp p 7 ^ 0.95 

at the 0.05 level of significance. Table I shows that for r = 0.93, z r = 1.65839, 
and for p — 0.95, z p = 1.83178. The test statistic, then, is 


1.65839 - 1.83178 
1 /VlO - 3 


-0.46 


Since -1.96 < -0.46 < 1.96, we cannot reject H 0 . We must conclude that p 
may be 0.95. 

For sample sizes less than 25, Fisher’s z transformation is usually not recom¬ 
mended. We may use an alternative method given by Hotelling (1953) for sample 
sizes equal to or greater than 10. With this method, we use the following trans¬ 
formation of r (symbolized by z*): 

3z, + r 


The standard deviation of z* is 


The test statistic is 


Vn — 1 


Z* = ——f== = (z* — f*)Vn — 1 ( 9 . 8 . 8 ) 

l/vn - 1 

where (pronounced zeta) = z p - (3z p + p)/4n. We obtain critical values from 
the standard normal distribution. 

In the present example, if we test H 0 : p = 0.95 against Hp p 0.95, with 
a = 0.05, the Hotelling transformation gives 

, , 3(1.65839) + (0.93) ___ 

z* = 1.65839 -4(10)- = 

and 

^ = 1.83178 - 3(1 83l ;y 0 95 = 1.67065 


C* = 1.83178 - 


1.51076 


.67065 


We may use these results to compute 

Z* = (1.51076 - 1.67065)V 10 - 1 = -0.48 

Since -0.48 is greater than - 1.96, we cannot reject H 0 . We reach the same 
conclusion we did with Fisher’s transformation. 


Confidence Interval We can use Fisher’s transformation to construct 100(1 - a)% confidence intervals 
for p for p. We use the general formula for a confidence interval: 

Estimate ± (reliability factor) X (standard error) 



We convert the estimator r to z r , and construct a confidence interval about z p . We 
reconvert the limits to obtain a 100(1 - a)% confidence interval about p. The 
general formula for a confidence interval for z p is given by 

Z r ± Z& Zr (9.8.9) 

For this example, the 95% confidence interval for z p is 

1.65839 ± 1.96(1/V 10 - 3) 

1.65839 ± 0.74080, 0.91759, 2.39919 

Converting these limits, which are values of z r , into values of r gives the following 
95% confidence interval for p: 

0.725, 0.983 

Because of the limited number of entries in Table I, we must consider this interval, 
obtained by rough interpolation, only approximate. 

Alternatively we can use special charts prepared by David (1938) for construct¬ 
ing confidence intervals for p. When the correlation assumptions of this chapter 
are not met, we can compute a measure of correlation known as the Spearman 
rank correlation coefficient. We discuss the procedure in Chapter 12. 

9.8.1 The following tabic shows the amount spent for insurance during a year (Y) and 
volume of business (X) in tons hauled for a random sample of trucking firms of a certain 
type. All figures are in thousands. 


Y, $ 13 18 14 18 23 21 14 25 23 14 

X, tons 10 16 12 18 17 17 9 19 17 11 

2x, = 146 ly; = 183 £x,y, = 2804 Ixf = 2254 lyf = 3529 


(a) Plot the data as a scatter diagram, (b) Test // 0 : p = 0 at the 0.05 significance level, 
(c) Test H 0 : p = 0.95. (d) Determine the p value for each test, (e) Interpret r 2 . (f) Discuss 
the way the sampling scheme in this case makes correlation analysis, as well as regression 
analysis, appropriate. 

9.8.2 The following table shows the scores on a clerical aptitude test (X) and grades in a 
clerical skills course (F) for 10 business students. 


X 

60 

70 

65 72 75 

75 

82 84 

90 

95 


Y 

68 

72 

76 78 80 

86 

82 90 

96 

93 



lx,- 

= 768 

Sx,y, = 63,885 

Zv? 

= 68,153 

Sx? = 60,064 

ly,- = 821 


(a) Plot the data as a scatter diagram, (b) Perform the correct hypothesis test to determine 
whether the data provide sufficient evidence to indicate thatX and Y are linearly correlated. 
Let a - 0.05. (c) Test H 0 : p = 0.98. (d) Determine the p value for each test. 

9.8.3 The following table shows the expenditures for equipment maintenance (X) and net 




income before taxes (Y) for a random sample of 10 firms of a certain type. All figures are 
coded for ease of calculation. 


X, $ 10 20 25 30 36 42 54 62 74 82 

Y, $ 12 24 14 18 18 28 26 40 38 54 


lx t = 435 ly, = 272 Ix,y, = 14,438 Ixf = 24,045 Xyf = 8984 


(a) Plot the data as a scatter diagram, (b) Determine whether we can conclude from these 
data that the two variables arc linearly related. Let a — 0.05. (c) Test H 0 : p = 0.95. (d) 
Determine the p value for each test. 


9.9 CONSIDERATIONS IN DECIDING BETWEEN 
REGRESSION AND CORRELATION 

The nature of our questions is the primary determining factor in choosing between 
regression analysis and correlation analysis. As previously noted, regression anal¬ 
ysis is better when we want to determine the nature of the relationship between 
two variables (for example, to determine whether they are linearly related), to 
predict what value Y is likely to assume for a given value of X, and to estimate 
means of subpopulations of Y values. If we are interested in assessing the strength 
of the relationship between two variables, then correlation analysis will suffice. 

We emphasize that the correlation model requires that both X and Y be random 
variables. In order for inferential procedures to be valid, the two variables must 
follow a bivariate normal distribution. Typically we obtain the data for correlation 
analysis by randomly drawing a sample of subjects (units of association) from the 
population of interest and taking a measurement on each of the two variables X 
and Y. We therefore conduct our analysis on all the observed values. That is, we 
place no restriction on what values of either variable may enter into the analysis. 

Regression analysis has broader applications, since it does not require that both 
variables be random. For example, an investigator interested in studying the nature 
of the relationship between family income and propensity to save may feel that 
studying only families whose incomes are of certain specified magnitudes would 
be more meaningful and practical. To obtain the data for such an analysis, the 
investigator would select for study only families with the specified incomes. When 
data are gathered in this manner, the investigator is considered to have “control’ ’ 
over one of the variables. In this example, the variable of family income is 
controlled. As we have seen, the variable that is controlled is called the inde¬ 
pendent variable. 

In correlation analysis, where no control is exercised on any variables, we don’t 
speak of “independent” and “dependent” variables. Thus in regression analysis 
the independent variable does not have to represent either the complete set of 
values that might occur or the proper relative frequencies with which the selected 
values occur. Correlation analysis, on the other hand, should be based on a sample 
selected from the entire set of values that occur in the given situation. 



9.10 SOME PRECAUTIONS 


When properly used, regression and correlation analysis are powerful statistical 
techniques. Their inappropriate use, however, can lead only to meaningless re¬ 
sults. We offer the following suggestions: 

1. Carefully review the assumptions underlying regression and correlation anal¬ 
ysis before you collect the data. It is rare for assumptions to be met to perfection. 
However, you should have some idea of the magnitude of the gap between the 
data to be analyzed and the assumptions of the proposed model. That way, you 
can decide whether you should choose another model, proceed with the analysis 
but use caution in interpreting results, or use the chosen model with confidence. 

One alternative, when the assumptions of this chapter are not met, is to use a 
nonparametric technique for analysis. We shall discuss this alternative in greater 
detail in Chapter 12, on nonparametric statistics. 

2. No matter how strong the indication of a relationship between two variables, 
do not interpret it as one of cause and effect. For example, suppose that you 
observe a significant sample correlation coefficient between two variables X and 
Y. This can mean one of several things: (a) X causes Y. (b) Y causes X. (c) Some 
third factor, either directly or indirectly, causes both X and Y. (d) An unlikely 
event has occurred, and a large sample correlation coefficient has been generated 
by chance from a population in which X and Y are in fact not correlated. Or: (e) 
The correlation is purely nonsensical, a situation that may arise when measure¬ 
ments of X and Y are not taken on a common unit of association. 

3. Do not use the sample regression equation to predict or estimate outside the 
range of values of the independent variable represented in the sample. This prac¬ 
tice, called extrapolation, can have dangerous consequences. The true relationship 
between two variables may be linear over an interval of the independent variable, 
but may best be described as a curve outside this interval. If the sample happens 
to be drawn only from the interval where the relationship is linear, it provides 
only a limited representation of the population. To project the sample results 
beyond the interval represented by the sample may lead to false conclusions. 
Figure 9.10.1 shows one of the possible pitfalls of extrapolation. 

4. Researchers often find that, among the data they have collected as part of an 
experiment or a sample survey, there are one or more observations that seem to 
be “unusual” relative to the majority of the observations. These unusual obser¬ 
vations are variously called “spurious,” “unrepresentative,” “mavericks,” and 
“outliers.” They may be either much smaller or larger than the bulk of the 
observations collected. The presence of outliers may cause misleading results. 
Analyzing residuals makes the detection of outliers easier. What to do about 
outliers has bothered researchers for more than 100 years. Areas of concern are: 
(1) How can one be sure that a given observation is a true outlier, that is, that it 
does not “belong” to the set of data under consideration? (2) What is the cause 
of a confirmed outlier? (3) What should one do with confirmed outliers? Attempts 
to answer the questions posed by outliers have generated a large body of literature, 



FIGURE 9.10.1 
Example of 
possible danger of 
using extrapolation 
in linear regression 


Summary 



dating back to the mid-1800s. [The problem of outliers is discussed by Kleinbaum 
and Kupper (1978), Younger (1979), and others. Daniel (1980) has prepared a 
bibliography on outliers in research data.] 

This chapter presented two important tools of statistical analysis, simple linear 
regression and correlation. It suggested the following outline for the application 
of these techniques: 

1. Identify the model. 

2. Review the assumptions. 

3. Obtain the regression equation. 

4. Evaluate the equation. 

5. Use the equation. 

You saw how you can use an analysis of residual plots to determine whether 
or not it is likely that the data violate the assumptions underlying regression 
analysis. 

You saw that correlation analysis, though closely related to regression analysis, 
is used for a different purpose: to study the strength of the relationship between 
two variables. Correlation analysis is strictly valid only when both X and Y are 
continuous random variables. However, regression analysis is valid when X is 
either fixed or random. You learned that you can use regression analysis to predict 
the value Y is likely to assume for a given X. You can also use it to estimate the 
mean of the subpopulation of Y values at a given value of X. The chapter discussed 
methods for testing hypotheses and constructing confidence intervals under both 
the regression and the correlation models. 

This chapter dealt with regression and correlation in which the model specifies 
a linear relationship between X and Y. However, sometimes we can best describe 
the relationship between two variables by a second-degree curve, or by one that 
is even more complicated. The texts by Draper and Smith (1981), Neter and 
Wasserman (1974), Snedecor and Cochran (1980), and Williams (1959) treat the 
techniques and concepts appropriate for these models. 



Sometimes we can predict a dependent variable more precisely using two or 
more independent variables rather than one. There are also situations in which we 
are more interested in knowing the strength of the relationship among several 
variables than in knowing the relationship between only two variables. Chapter 
10 will explore these possibilities. 

Review Questions 1. In the equation y = a + bx, explain: (a) two ways we may interpret y, (b) the 
meaning of a , and (c) the meaning of b. 

2. What is a scatter diagram? 

3. Why is the regression line called the least-squares line? 

4. What are the basic assumptions underlying regression analysis? 

5. Give three interpretations of the coefficient of determination. 

6. For the following expression, explain each of the terms, express each term symboli¬ 
cally, and draw a picture to illustrate the relationship: 

Total sum of squares = explained sum of squares + unexplained sum of squares 

7. What is the function of the analysis of variance in regression analysis? 

8. Describe three ways of testing the null hypothesis that /3 = 0. 

9. What are the assumptions underlying simple correlation analysis when inference is an 
objective? 

10. What is meant by the unit of association in regression and correlation analysis? 

11. What arc the possible explanations for a significant sample correlation coefficient? 

12. Explain why it is risky to use a sample regression equation to predict or estimate 
outside the range of values of the independent variable represented in the sample. 

13. Describe a situation in your area of interest in which simple regression analysis would 
be useful. Use real or realistic data and do a complete regression analysis. 

14. Describe a situation in your area of interest in which simple correlation analysis would 
be useful. Use real or realistic data and do a complete correlation analysis. 

15. The following table shows the age and efficiency rating of a random sample of 20 
assembly-line employees, (a) Obtain the equation describing the linear relationship between 
age and efficiency rating, (b) Compute r. (c) Do the data provide sufficient evidence to 
indicate that the two variables are correlated? (d) Determine the p value for the test. 




16. A survey is conducted among customers holding charge cards from a certain depart¬ 
ment store. Researchers ask each of a random sample of 16 customers to estimate the 



amount he or she charged at the store during the past month. The estimates and actual 
charges obtained from store records are as follows (amounts are rounded to the nearest 
dollar), (a) Compute r and test for significance at the 0.05 level, (b) Find the p value for 
the test. 


Customer 

Actual 

Estimate 

Customer 

Actual 

Estimate 

1 

85 

85 

9 

84 

75 

2 

96 

100 

10 

41 

50 

3 

49 

50 

11 

67 

75 

4 

97 

100 

12 

72 

75 

5 

90 

75 

13 

92 

100 

6 

28 

50 

14 

29 

30 

7 

25 

30 

15 

74 

60 

8 

28 

35 

16 

75 

90 


17. A garment manufacturer wants to know the relationship between the age and annual 
maintenance costs of sewing machines. A sample of 16 machines reveals the following 
ages and maintenance costs during the past year, (a) Find the sample regression equation, 
(b) Do the data provide sufficient evidence at the 0.05 level to indicate that the two 
variables are related? Use ANOVA. (c) Determine the p value for the test, (d) What is 
the expected annual maintenance cost for a seven-year-old machine? 


Age (years) 

Maintenance 
costs (dollars) 

Age (years) 

Maintenance 
costs (dollars) 

8 

109 

1 

25 

3 

75 

3 

70 

1 

21 

6 

126 

9 

135 

2 

58 

5 

67 

1 

30 

7 

125 

2 

47 

5 

71 

6 

120 

2 

52 

8 

105 


18. The following arc scores on a sales aptitude test made by 15 salespersons, and their 
performance ratings as given by their supervisor, (a) Find the sample regression equation, 
(b) Use ANOVA to evaluate the equation. Let a = 0.05. (c) Determine the p value for 
the test. 


Aptitude 
test score ( X) 

Performance 
rating (/) 

Aptitude 
test score (X) 

Performance 
rating (/) 

92 

70 

84 

63 

77 

57 

70 

51 

83 

65 

85 

66 

72 

55 

81 

62 

81 

62 

76 

53 

90 

79 

76 

52 

7.9 

57 

70 

59 

91 

73 




19. The following table shows the hardness, measured in units of Brinell hardness, and 
tensile strength, in thousands of pounds per square inch, of 15 specimens of a metal alloy, 
(a) Plot a scatter diagram of these data, (b) Find the sample regression equation, (c) Test 
the null hypothesis that /3 = 0. (d) Compute r 2 . (e) Suppose that a new specimen has a 
hardness of 50. Estimate the tensile strength with a 95% confidence interval. 



Hardness 

Tensile strength 

Hardness 

Tensile strength 

41 

27 

41 

23 

74 

42 

96 

51 

100 

62 

20 

15 

72 

43 

45 

22 

67 

37 

38 

19 

55 

31 

26 

17 

71 

45 

29 

17 

35 

21 




20. A professor is doing a study of the relationship between students’ grades and their 
part-time work. She analyzes data from a random sample of 15 college students. The 
numbers of hours worked per week and the grade-point averages of these students are as 
follows, (a) Plot a scatter diagram of these data, (b) Find the least-squares regression 
equation, (c) Do these data provide sufficient evidence to indicate that there is a relationship 
between grade-point average and number of hours worked? (d) Determine the p value for 
the test. 


Hours/week 
worked (X) 

G.P.A. (/) 

Hours/week 
worked (X) 

G.P.A. (Y) 

10 

2.5 

32 

3.9 

22 

3.0 

29 

2.7 

24 

2.3 

29 

1.4 

28 

3.5 

5 

2.4 

9 

4.0 

25 

1.8 

10 

1.8 

9 

1.7 

7 

2.8 

17 

2.9 

7 

3.2 




21. From Table D in the Appendix, select a random sample of 15 pairs of one-digit 
numbers. Compute the sample correlation coefficient. Test H 0 : p — 0 against the alternative 
//,: p # 0 at the 0.05 level of significance. Compare your results with those of other 
members of the class. 

22. For each of the following situations, indicate whether one should use regression anal¬ 
ysis or correlation analysis, (a) An industrial psychologist wants to know the relationship 
between intelligence and job satisfaction of assembly-line employees, (b) In a factory, 
there are two methods for measuring the durability of a certain product. One (the direct 
method) is expensive and hard to perform, but gives a true measure of durability. A second 
(the indirect method) provides an indirect measure of durability. Officials will switch to 
the less expensive indirect method if it can be shown that this method is a good predictor 
of the results that the direct method would give, (c) A drug company wants to know the 
average reduction in reaction time to a certain stimulus of people who take various strengths 
of their drug. The strengths are 1-, 2-, 3-, and 4-milligram doses, (d) Medical researchers 
wish to know whether high levels of exercise are associated with low levels of serum 
cholesterol in adults. 

23. The president of a firm wants to find out the relationship between the effectiveness of 
salespersons, as measured in dollar volume of sales, and their aptitude for selling. Data 
are collected on a sample of 15 salespersons and analyzed by means of regression analysis. 
The independent variable is the salesperson’s score on a sales aptitude test. The dependent 
variable is mean annual sales, in $10,000 units, over the past five years. The following 



computer printout shows the results of the analysis, (a) Write out the sample regression 
equation, (b) Should 7/ 0 : /3 = 0 be rejected at the 0.05 level of significance? 


VARIABLE 


REGR COEFF 

MEAN VALUE 

STD DEV 

1 (CONSTANT 

= 

4,78001) 

11.4667 

2,47301 

2 


♦897939E-01 

74,4666 

18.2939 

INDEX I 

□ F 

DETERMINATION 

(R-SQ) = ,4412 

19 

CORRELATION 

MATRIX 



1 


.664244 



,664244 


1 



DO YOU WISH 

TABLE OF VALUES 

PRINTED? YES 


ACT 

U 

A L VS C 

A L C U L A T 

E D 

ACTUAL 


CALCULATED 

RESIDUAL 

PCT RESIDUAL 

1 1 


12.2329 

-1,23291 

-10 

13 


13,3104 

-*310435 

-2.3 

8 


9,3595 

-1,3595 

-14,5 

14 


13,6696 

.330389 

2.4 

14 


1 1 . 1554 

2,84462 

25.4 

15 


12,1431 

2 ♦ B5689 

23,5 

10 


10,0779 

-♦778542E-01 

- ,7 

13 


11 , 1554 

1,84462 

16,5 

13 


9,98806 

3.01194 

30.1 

11 


10,9758 

, 242071E-01 

*7 

6 


9,4493 

-3,4493 

-36,5 

8 


8,64115 

-.641151 

-7,4 

13 


13,2206 

-.220641 

-1,6 

1 1 


12,9513 

-1,95126 

-15 

12 


13,6696 

-1,66961 

_ 1 <7,2 

ANA 

L 

Y S I S OF 

V A R I A N C 

: e 

SOURCE 


SS 

DF 

MS 

REGRESSION 


40,476 

1 

40,476 

ERROR 


51,2608 

13 

3,94314 

TOTAL 


91,7368 

14 


♦♦SIGNIFICANT 

AT 11 LEVEL 



VARIABLE 


COEFFICIENT 

STD ERROR 

T STATISTIC 

2 


,897933E-0 1 

,280274E-01 

3.20379 


D . F . * 13 


F 


10 * 2649** 


24. Select a simple random sample of size 40 from the population of employed heads of 
household in Appendix II. Do a complete regression analysis, using education as the 
independent variable and annual salary as the dependent variable. 



jiuifjic L.nicai ncyi airu wiiciam/f* 



Is Ignorance Bliss? 


Lee Sigelman* subjected the old adage "Ignorance is bliss" to modern scientific 
scrutiny. He gave subjects in his study a shortened form of the Thorndike in¬ 
telligence test, which measures happiness in terms of a three-point scale. Low 
scores indicate happiness, high scores unhappiness. Sigelman collected data on 
2650 subjects, and reported a simple correlation coefficient between intelli¬ 
gence and happiness of -0.064. For testing H 0 : p = 0 versus 0, what 

are the degrees of freedom? What value of the test statistic can one compute 
from the reported result? Can one reject H 0 at the 0.05 level? Why? What is 
the p value for this test? What can one conclude from the test of the hypoth¬ 
esis? 

*Lee Sigelman, "Is Ignorance Bliss? A Reconsideration of the Folk Wisdom," Human Relations, 34 (1981), 
965-974. 


The Poisoning of Livestock by Insecticides 


In recent years environmentalists and health professionals have been con¬ 
cerned about the ill effects on the environment of the widespread use of 
insecticides. Insecticides kill insects—and much else as well. They often find 
their way into the systems of animals, and even people. In order for human 
beings to cope with the problem and make decisions about how to deal with 
it, they must understand the effect of insecticides on humans and other ani¬ 
mals. In the kind of study often done to promote such understanding, Mount 
and Oehme* investigated the effect of a commonly used insecticide on sheep. 
Among other statistical analyses, they derived the following linear regression 
equation (n = 16). This equation describes the relationship between the activ¬ 
ity of a certain enzyme in the sheep's brain (Y) and the time (in hours) after 
the sheep has been exposed to the insecticide (X): 

y = 27.32 + 1.36x 

Suppose that 30 hours have elapsed since a sheep has been exposed to the 
insecticide. What is the predicted value of V? How would you describe the 
relationship between these two variables? 

Mount and Oehme computed a coefficient of determination of 0.86 from 
their data. What conclusion can one draw about the true relationship between 
the two variables? Let a = 0.05. Find the p value. What assumptions are nec¬ 
essary? 

*Michael E. Mount and Frederick W. Oehme, "Diagnostic Criteria for Carbaryl Poisoning in Sheep," Archives 
of Environmental Contamination and Toxicology, 10 (1981), 483-495. 


10. Multiple Regression 
and Correlation 


Chapter Objectives: In Chapter 9 you studied tech¬ 
niques for investigating the relationships between 
two variables. In this chapter we extend the discus¬ 
sion to include relationships among three or more var¬ 
iables. After studying this chapter and working the ex¬ 
ercises, you should be able to do the following. 

1. Write and explain the multiple-regression model 

2. Obtain a multiple-regression equation from sample 
data 

3. Evaluate a sample regression equation by means of 
analysis of variance 

4. Test null hypotheses about and construct confi¬ 
dence intervals for individual population regression 
coefficients 

5. Use a sample multiple-regression equation for pre¬ 
diction and estimation 

6. Compute a coefficient of multiple correlation from 
sample data 

7. Test the sample multiple and partial correlation 
coefficients for significance 




10.1 INTRODUCTION 


Chapter 9 presented the concepts and techniques of regression and correlation as 
tools for analysis of the relationships between two variables. It showed that regres¬ 
sion analysis, properly done, leads to an equation that we can use to predict the 
likely value of one variable, given the value of some other associated variable. 
Similarly, it showed that we can use correlation analysis to assess the strength of 
the relationship between two variables. 

One would think that if we can predict the value of a variable on the basis of 
knowing one associated variable, we might be able to make an even better pre¬ 
diction given knowledge of several associated variables. We also often want to 
obtain some measure of the strength of the relationship among several variables 
rather than only two. A market analyst, for example, may correctly predict sales 
of a company’s product in a given area from a knowledge of such things as the 
age composition of the population in the area, per capita income, population 
density, and the amount of money the firm spends on advertising. A personnel 
director may find that the productivity of employees is related to such factors as 
their experience, education, intelligence, emotional stability, and aptitude. A qual¬ 
ity-control engineer may find that the quality of the product depends on such 
variables as temperature, humidity, and pressure under which it is produced, as 
well as the quality of the raw material and the amount of some key ingredient. 

We can study relationships such as these by means of multiple-regression and 
multiple-correlation analysis. The techniques are logical extensions of those used 
in simple linear regression and correlation. Thus the following discussion parallels 
that of Chapter 9. We shall present the multiple-regression model and its under¬ 
lying assumptions first. Then we’ll look at ways to obtain and evaluate the mul¬ 
tiple-regression equation. We shall also illustrate the use of the equation for pre¬ 
dicting and estimating. Finally, we shall present the multiple-correlation model, 
assumptions, and analyses. 


10.2 THE MULTIPLE-REGRESSION MODEL 
AND ITS UNDERLYING ASSUMPTIONS 

We can write the multiple-regression model as follows: 

yj = ft) + j + @2*2j + * * ‘ + PigCkj + Cj (10.2.1) 

where y f is a typical value of Y, the dependent variable, from the population of 
interest; /3 0 , • • *, /3 k are the population partial regression coefficients; and 

Xy, Xy, * • •, x kj are observed values of the independent variables X { , X 2 , * • *, 
X k , respectively. 

The following are the necessary assumptions underlying the multiple-regression 
model when inference is an objective of the analysis: 

1. The Xj may be either random or nonrandom (fixed) variables. Because of their 
role in explaining the variability in the dependent variable Y , they are sometimes 



referred to as explanatory variables. The X s also are sometimes referred to as 
predictor variables , because of their role in predicting Y. 

2. For each combination of X i values, there is a normally distributed subpopu¬ 
lation of Y values. 

3. The variances of the subpopulations of Y values are all equal. 

4. The Y values are independent. This means that the value of Y selected for one 
value of X does not depend on the value selected for another value of X. 

5. The €j are normally and independently distributed, with mean 0 and vari¬ 
ance a 2 . 

You know that we can describe the linear relationship between two variables 
by a straight line. But how can we describe the linear relationship between several 
variables? We do so by means of a regression surface: a plane when three vari¬ 
ables are involved, or a hyperplane when there are more than three variables. 


10.3 OBTAINING THE SAMPLE MULTIPLE-REGRESSION EQUATION 


To find the sample multiple-regression equation, we must first get a set of normal 
equations. We get the normal equations by the method of differential calculus, in 
which the quantity 

2^ = 2CVy ~~ k Q — b x Xyi — bpCy — ♦ • • — b/X k j) 2 (10.3.1) 


is minimized. Equation 10.3.1 represents the sum of the squared deviations of the 
observed values of Y from the regression surface. Thus the method of least squares 
is used for multiple regression just as it was for simple linear regression. In the 
multiple-regression case, the sum of the squared deviations of the observed values 
from the regression surface is minimized. 

The normal equations for k variables are as follows: 


nb 0 + h t 2*i; + b 2 2*2, + ■ ■ ■ + b k 2 **/ = Xx, 

K 2*i j + b i 2*y + b i 2*i,*2, + ‘ ‘ ‘ + b k 2*i,*« = 2*,.* 

b 0 2* ? / + z>, 2 *?/*ij + b 2 2*1 i +••• + *»• 2*2,**, = 2*2,y, 

K 2**, + b \ 2*i,**,• + b 2 2*2,**, + ••• + ** 2**,- = 2**,y,- 


(10.3.2) 


Note that the number of equations is the same as the number of parameters to 
be estimated. Solving these normal equations leads to the following sample regres¬ 
sion equation: 


S’j = b 0 + b,x v + b 2 x 2j + • • • + b k x kj (10.3.3) 


When the model contains only two independent variables, the sample regression 
equation is given by 


Pj = b 0 + h,Xy + b 2 x 2 j 


( 10 . 3 . 4 ) 



where b 0 is the Y intercept of the plane, and b x and b 2 are the slopes of the plane 
associated with X x and X 2 , respectively. The amount by which y changes for a 
unit change in x { , with jc 2 held constant, is given by b { . Similarly, b 2 is the amount 
by which y changes for each unit change in x 2 when x x is held constant. It is for 
this reason that the b t and the parameters they estimate are called partial regression 
coefficients. 

Figure 10.3.1 shows a typical scatter diagram of sample values and the corre¬ 
sponding regression plane for the case of two independent variables. 

To show the calculations you need to find the sample regression equation in 
the three-variable case, consider the following example. Here we first present the 
formulas and computations you need to obtain the regression equation when you 
don’t have a computer. At the end of Section 10.5 we discuss the way to obtain 
the regression equation when you do have a computer. Figure 10.5.1 shows a 
typical computer printout containing numerical values of the coefficients for the 
equation. 


EXAMPLE 10.3.1 A market-research firm wants to predict weekend circulation of 
daily newspapers in various market areas. The firm selects two variables, total 
retail sales and population density, as the independent variables. A random sample 
of n = 25 trade areas gives the results shown in Table 10.3.1. 

We assume that the multiple-regression model of Equation 10.2.1, for three 
variables, applies here. Since there are measurements on three variables, we need 
three normal equations. They are as follows: 


nb 0 + b, 2*i, + b 2 2 *v = S.Vy 
h 5>i/ + b, 24 + b 2 j = 2*. jyj 
b o 2*, + b \ lx v x 2j + b 2 24 = 2 * 2 j)>j 


(10.3.5) 


FIGURE 10.3.1 
Scatter diagram 
and regression 
plane for multiple- 
regression model 
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(10.3.6) 


b 0 = 


^ 2x, ^ 


We divide the first normal equation by n and solve for b 0 to get 

<y 

n ‘ n 

Substituting this result for b 0 in the second normal equation gives us 
'2/y,- Zx, ; 2x- 
n 


SXy ^2, 

b i £4 


n n 

-> 


zx 2j 
b 2 — = }' - 


2* v + &,24 + 42* y *2/ = 2*i,* 


2.v,^ (2*,,) 

" *• "7“ • 

Rearranging terms, we get 

(2*,/ 


2x Sx 

*2 — 1 + *24 + b 2 'Zx v x 2 j = 24 * 


*i 


24 - 


+ b 


2*y* y ~ 


^ X \£ X U\ v 

= 2*i* 


2x v 2>>. 


(10.3.7) 


Similarly, substituting for b 0 in the third normal equation and rearranging terms 
yields 

(Ix 2 ) 2 ' 


2 * 1 /*?/ 


Sx ly Xx 2y 


4- b- 


24 - 


Sx 2/ Sy. 

2,^7 -- 00.3.8) 


We need the following sums of squares and cross products, computed from the 
data of Table 10.3.1, to evaluate the normal equations for Example 10.3.1: 


2*17 “ 


ffx 1; ) 2 


= 22,429.15 


(739.5) 2 
25 


= 554.74 


24 ■ 


- 


n 

2*1/ 2x 2 - 


(1576.6) 2 

101,568.00 - 2 —rr— 1 = 2141.30 


2*,* - 


2*2* - 


n 


2*1,' Sv,- 


Ex?; Xv, 


47,709.10 - 

2968.58 - 


25 

(739.5X1576.6) 

25 

(739.5X98.2) 


= 1073.27 


= 6317.95 - 


25 

(1576.6)(98.2) 

25 


= 63.82 


= 125.07 


We substitute the results of these computations into Equations 10.3.7 and 10.3.8 
to obtain 

554. 14b, + 1073.27^ 2 = 63.82, 1073.276, + 2141.30 b 2 = 125.07 

We solve these simultaneous equations for b { and b 2 and obtain 6, = 0.06741 
and b 2 = 0.02462. We find the constant b 0 from the following relationship, as 
shown in Equation 10.3.6: 




b 0 = y - Vi - & 2*2 


For our example, we have 

b 0 = 3.928 - (0.06741)(29.58) - (0.02462)(63.064) - 0.381 

Now, when we round to three places, we have our sample multiple-regression 
equation: 

y f = 0.381 + Q.067xy + 0.025.r 2/ 

You can see that there would be a lot of work involved in obtaining the regres¬ 
sion equation when there are more than three variables. The best way to do the 
computations you need in such a case is to use the abbreviated Doolittle method 
(1878). This method is illustrated for four independent variables by Anderson and 
Bancroft (1952), and for three by Snedecor and Cochran (1980) and Steel and 
Torrie (1980). 

A computer eliminates the computational burden involved in multiple-regression 
analysis, since you can get “canned” programs that will handle large sets of data 
and a large number of variables. The exact specifications for entering data and 
the kinds of output vary from one computer installation to another. In general, 
you specify the number of variables and the number of observations on each 
variable. You enter this information, along with the observations themselves, into 
the computer in various ways, including punched cards, paper and magnetic tape, 
and remote terminals. Many programs let you indicate, as part of the input, which 
of many output options you prefer. Generally the descriptive measures discussed 
in this chapter, as well as others, are available as output. 

In subsequent calculations the notation is greatly simplified if we transform 
each variable into a deviation from its mean. We call these deviations 
y'j, x[j, and Xy. This procedure gives us 

y'j = yj ~ 4 = x u ~ 4 4 = x y - * 2 00 . 3 . 9 ) 


Therefore we can rewrite the sums of squares and cross products in Equations 
10.3.7 and 10.3.8 as follows: 


24 — 24 
2-4 = 2(- v 2 / 
24-4 = 24 
24 yj = 24 

2- v 2/3y = 24 


4 2 = 24 • 
4 2 = 24 ■ 
44 _ 4 
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x 2 )(yj - y) = 


(^Xy)- 
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(2x 2 j) 2 
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= iXy 
S*l. J>’j ~ 


_ 2*1/ 
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2xy Zyj 
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Exercises 


Find the multiple-regression equation lor the sample data in each of the following exercises. 
Wi 10 . 3.1 In a study of the yield of a certain grain, in bushels per acre (F), researchers obtain 
jl the following data from 10 farms. Here X t is fertilizer applied, in pounds per acre, and 
X 2 is an index of soil quality. 


Y 

50 

52 

56 

59 

62 

64 

68 69 70 

71 

Total 

= 621 

X\ 

38 

39 

39 

41 

44 

42 

43 46 48 

47 

Total 

= 427 

X 2 

50 

50 

54 

56 

56 

60 

64 63 62 

60 

Total 

= 575 



2 vf = 

39,087 


2*1/ = 

33,297 

2*2/// = 36,039 






2*1/ = 

18,345 


2*i/// ; 

- 26,742 

2*i/* 2/ = 24,682 





10 . 3.2 The following table shows the scores (F) made by 10 assembly-line employees on 
a test designed to measure job satisfaction. It also shows the scores made on an aptitude 
test (Xj) and the number of days absent (X 2 ) during the past year (excluding vacations). 
All employees were on the payroll during the entire year. 



Y 

* 2 


70 

6 

1 


60 80 50 

6 8 5 

2 1 8 

2 yf = 47,455 
2*1/ = 463 


55 85 

6 9 

9 0 

2 * 1 / = 158 
2*i /// = 4673 


75 70 72 

8 6 7 

1 1 1 

2*2/// =1510 
2*i/*2/ = 153 


64 Total = 681 

6 Total = 67 

2 Total = 28 



10 . 3.3 The following table shows, for a particular week, the sales (F) of a certain product, 
advertising expenditures, (X,), and population density (X 2 ) for 10 market areas. Sales and 
advertising expenditures are in tens of thousands of dollars, and population density is in 
people per square mile. 


F 

20 

25 

24 

30 

32 

40 

28 

50 

40 

50 

Total = 

339 

X, 

0.2 

0.2 

0.2 

0.3 

0.3 

0.4 

0.3 

0.5 

0.4 

0.5 

Total = 

3.3 

x 2 

50 

50 

50 

60 

60 

70 

50 

75 

70 

74 

Total = 

609 


2// = 12-509 2*1/ - 38,101 2*2/// = 21,620 

2*1/ = 1-21 2*1/// = 122.8 2*i/*2, = 211.5 


10.4 EVALUATING THE REGRESSION EQUATION 

Before we can feel confident about using a sample multiple-regression equation 
for prediction and estimation, we must be assured that it adequately represents 
the relationship among the variables. The coefficient of multiple determination 
provides an overall measure of the adequacy of the equation. Also we can use 
analysis of variance and the F test as an overall significance test for regression. 
We can evaluate the importance of individual explanatory variables by examining 
the sample partial regression coefficients associated with each. 

In the rest of this section we present the formulas and techniques you need to 
evaluate the equation when you don’t have access to a computer. If you do have 


access to a computer, just study the concepts involved. Then you may wish to 
skip to the end of Section 10.5, where we discuss the computer analysis of Ex¬ 
ample 10.3.1. 


The Coefficient 
of Multiple 
Determination 


The coefficient of multiple determination is defined as 

Sty - y ) 2 _ Sty - S ') 2 

yJ2 - k Sty - jty Sty - y ) 2 


(10.4.1) 


The numerator of the middle term is the explained sum of squares, or the sum of 
squares due to regression , as it is sometimes called. The denominator is the total 
sum of squares. The subscript on R 2 indicates that Y is the dependent variable and 
X lt X 2 , ■ • ., X k are independent variables. 

Chapter 9 showed that r 2 , the coefficient of determination for simple linear 
regression, is positively biased. Likewise, the coefficient of multiple determination 
is a positively biased estimator of the population coefficient of multiple determi¬ 
nation. We can adjust for bias by computing 

Sty - y ,-) 2 

(n — k — 1) 

(10.4.1a) 

Sty - y) 

(n - 1) 

Since the correction factor, (n — 1 )(n — k — 1), approaches 1 as the size of the 
sample increases, the difference between R 2 , 2 k and R 2 12 k is negligible for 
large samples. 

From Chapter 9, the total sum of squares is equal to the sum of the explained 
sum of squares and the unexplained or error sum of squares. We can express this 
relationship as 

Sty - y ) 2 = Sty - y ) 2 + Sty ~ y) 2 <io.4.2> 


Rl 


y.\2...k 


or 


SST - SSR + SSE 


(10.4.3) 


The degrees of freedom associated with the total, explained, and error sums of 
squares are n — 1, k, and n — k — 1, respectively. 

We can find the total sum of squares from 


sst = Sty - .v ) 2 = S y? = S > 2 - 



(10.4.4) 


We can find the explained sum of squares from 

SSR = Sty - y ) 2 = Sty)’) + b 2 s x' v y'j +•••+£* S x'kjy'j (10.4.S) 

We can find the error sum of squares by subtraction. That is, 


SSE = SST - SSR 


(10.4.6) 



Let us again refer to Example 10.3.1 to see how to compute the various sums 
of squares and R 2 . Using data from Table 10.3.1, we use Equation 10.4.4 to get 

SST = 2CV/ - yf = Hy? = 393.26 - = 7.53 


Using Equation 10.4.5 and the results of the calculations we did in Section 10.3, 
we find that 

SSR = 2 (h - y ) 2 = 0.067(63.82) + 0.025(125.07) = 7.40 


By subtraction, we obtain 

SSE = - %) 2 = 7.53 - 7.40 = 0.13 


We may now use Equation 10.4.1 and compute 


*J.12 


SSR _ 7A0 
SST “ 7.53 


0.9827 « 0.98 


Thus, the regression of Y on X x and X 2 explains 98% of the total variation in 
K. We also may interpret R 2 12 as a measure of the goodness of fit of the regression 
plane to the observed points. 

We can display the results of these calculations in an analysis-of-variance table 
like Table 9.5.2. We then carry out an F test to determine whether the overall 
regression of Y on the independent variables is significant. We can state the 
hypotheses formally as follows: 

H 0 : there is no linear relationship between Y and the set of independent variables 
H{. there is a linear relationship between Y and the set of independent variables 

Table 10.4.1 shows the ANOVA table for the newspaper circulation data of Ex¬ 
ample 10.3.1. 

We see that the computed F is considerably larger than 5.72, the tabulated F 
for a = 0.01 and 2 and 22 degrees of freedom. Therefore, we would reject the 
hypothesis of no regression. We would conclude that the data at hand provide 
evidence to support, at the 0.01 level of significance, the contention that there is 
a linear regression between Y and the two independent variables (p < 0.005). 
When the analysis of variance leads to a significant computed F, we say that a 
significant proportion of the variation in Y is explained by the regression of Y on 
the independent variables. Or we simply say that R 2 J2 ,..k is significant. 


TABLE 10.4.1 
ANOVA table, 
Example 10.3.1 


Source of 
variation 

SS 

df 

MS 

F 

Regression 

7.40 

2 

3.7000 

627.12 

Error 

0.13 

22 

0.0059 



Total 


7.53 


24 



Inferences About 
Individual Partial 
Regression 
Coefficients 


We can make inferences about individual population partial regression coefficients 
when the assumptions in Section 10.2 apply. When these assumptions hold, the 
bj are each normally distributed, with mean \ 8, and variance c u <j 2 (c„ is defined 
below). Since or, the population variance in T, is usually unknown, we use its 
estimate s 2 ]2 k . To test the null hypothesis that /3, is equal to some particular 
value p j0 , we use the following test statistic: 


where 



b, = s y.\2...k Vcv 


(10.4.7) 


(10.4.8) 


The test statistic is distributed as Student’s t with n — k — 1 degrees of freedom. 
In Equation 10.4.8, j vj2 .. f * is the square root of the unexplained variance, or the 
error mean square, from the analysis-of-variance table. 

The quantity c n in Equation 10.4.8 is called a Gauss multiplier. When the 
analysis involves only three variables, we may obtain it by inverting the matrix 
of sums of squares and cross products that we can construct from the left-hand 
terms of the normal equations. The c u are the diagonal elements of the inverse. 
We may obtain the c values by solving the following two sets of equations: 

Cu 5X? + C,2 2/ = 1. Cl I 2J + C12 = 0 (10.4.9) 

C 21 ; + C 2 2 I i X 2 j — 0, C'21 2-C| j X 2j c 22 2 A 2/ = ' (10.4.10) 

where c l2 = c 21 . For a large number of independent variables, we expand Equa¬ 
tions 10.4.9 and 10.4.10 so that there are as many sets of equations as there are 
independent variables and an equal number of individual equations within each 
set. We place a 1 to the right of the equal sign in all equations of the form 
c u Xx:-]. In Equation 10.4.9, for example, there’s a 1 to the right of the equal 
sign in the equation containing c n XjcJy. Ezekiel and Fox (1959) have written out 
the equations for the case of three independent variables. We may also use the 
abbreviated Doolittle method, mentioned earlier, to find the c values. 

We may find the c values for our newspaper-circulation example by substituting 
the results of the calculations we performed in Section 10.3 into Equations 10.4.9 
and 10.4.10. The equations for the example are 

554.74c n + 1073.27 c 12 = 1, 1073.27c u + 2141.30c 12 = 0 

554.74c 21 + 1073.27c 22 = 0, 1073.27c 21 + 2141.30c 22 = 1 

When we solve these equations, we find 

C\ i = 0.0595529, c* 12 = c 21 = -0.0298493, c 22 = 0.0154282 

Now that we have the c values, we can compute the standard errors of the b h 
This will enable us to construct confidence intervals for and test hypotheses about 
individual /3’s. 



To illustrate inferential procedures with respect to fts, let’s take the newspaper- 
circulation example. Test the null hypothesis that ft = 0 against the alternative 
that ft 7 ^ 0. The hypotheses, stated formally, are 

ft>: ft = 0, ft: ft ^ 0 

Suppose that a = 0.05. Using Equation 10.4.7, we can compute the following 
value of the test statistic: 

f - h ~ 0 - 0.067 - 0 _ 

s v ,2 VcT, V0.0059 V0.0595529 


Since 3.57 is greater than the tabulated value of t, 2.0739, for a = 0.05 and 22 
degrees of freedom, we reject ft. We conclude thatX 1? total retail sales, is linearly 
related to newspaper circulation (p < 0.01). We conclude that this variable in the 
presence of X 2 is useful in predicting and estimating the dependent variable. 

We can carry out a similar test for ft. Suppose, for this example, that we test 

ft: ft = 0 against ft: ft ^ 0 
at the 0.05 level of significance. The computed value of the test statistic is 



Since the computed t of 2.62 is greater than 2.0739, we again reject the null 
hypothesis. We conclude that population density, when it is used as an explanatory 
variable in the presence of Xj, has a significant influence on weekend circulation 
of daily newspapers (p < 0 . 01 ). 

We can construct confidence intervals for the ft from the general formula 
Estimate ± (reliability factor) X (standard error) 

That is, 


b i ± hi 


a/2 t « k 1)‘V12...* 


For our newspaper-circulation example, the 95% confidence interval for ft is 
0.067 ± 2.0739 V00059 V0.0595529 
0.067 ± 0.039, 0.028, 0.106 

We interpret this interval in the usual ways. 

In testing hypotheses about all the ft, the same problem arises concerning 
multiple tests using the same data that arose in the discussion of analysis of 
variance. If multiple tests are performed, the actual level of significance is gen¬ 
erally larger than that stated. We have the same problem when we construct 
confidence intervals. If we construct them for more than one ft they will not be 
independent. Thus the tabulated confidence coefficient is not appropriate. When 
we want confidence intervals for two or more partial regression coefficients, we 
can follow a procedure suggested by Durand (1954). 


Exercises 


Predicting Y for 
Given Values of 
the Independent 
Variables 


Note that R 2 12 may be significant when only one or neither of the partial 
regression coefficients is significant. In fact, in multiple-regression analysis, Geary 
and Leser (1968) point out that any one of the following situations may arise: 

1. R 2 and all 6, significant 

2. R 2 and some but not all b { significant 

3. R 2 but none of the b t significant 

4. All b i significant but not R 2 

5 . Some b ( significant but not all nor R 2 

6 . Neither R 2 nor any b f significant 

Geary and Leser discuss circumstances under which each situation may occur and 
offer suggestions for interpreting such results. 

10.4.1 Refer to Exercise 10.3.1. (a) Calculate the coefficient of multiple determination, 
(b) Perform an analysis of variance, (c) Test the significance of each b t . (d) Construct a 
confidence interval for at least one ft. Let a = 0.05 for all tests of significance and 
confidence intervals. Determine the p value for each test. 

10.4.2 Refer to Exercise 10.3.2. Do the analysis suggested in Exercise 10.4.1. 

10.4.3 Refer to Exercise 10.3.3. Do the analysis suggested in Exercise 10.4.1. 


10.5 USING THE SAMPLE MULTIPLE-REGRESSION EQUATION 

We use the sample multiple-regression equation for the same purposes as the 
simple linear regression equation: (1) to predict the value Y is likely to assume 
for given values of the independent variables, and (2) to estimate the mean of the 
subpopulation of Y values for a given combination of values of the independent 
variables. 

When the assumptions of Section 10.2 are met, we may also construct predic¬ 
tion intervals and confidence intervals by methods that are straightforward exten¬ 
sions of those in Chapter 9. 

To predict the value Y is likely to assume for given values of the independent 
variables, we substitute the values of interest into the regression equation. The 
100(1 - a)% prediction interval for the three-variable case is given by 


% — fil - a/lM-k- lrfv .12 y 1 + ^ + C u x'yj + C 2 2*2/ + 2c {^x'ljX'lj (10.5.1) 

To illustrate the use of the regression equation for predicting purposes, let us go 
back to our newspaper-circulation example. The equation is 

yj — 0.381 + 0.067x ,y + 0.025x 2 y 

Suppose that, from the population of trade areas from which the sample was 
drawn, we draw another trade area with total retail sales of $25,000,000 and 



population density of 52 persons per square mile. We wish to predict the weekend 
circulation of daily newspapers for this trade area. 

Substituting X x = 25 and X 2 = 52 into the regression equation gives 

y = 0.381 + (0.067)(25) + (0.025)(52) = 3.356 

Our prediction of the weekend circulation of daily newspapers for this trade area 
is 3356. 

We make appropriate substitutions into Equation 10.5.1 to find the following 
95% prediction interval: 

3.356 ± 2.0739 (V0.0059) 

x Vl + A + (0.0595529X-4.58) 2 + 0.0154282(- 11.064) 2 
+ 2( -0.0298493X- 4.58)( - 11.064) 

3.356 ± 2.0739(0.0768X1.0736) 

3.356 ± 0.171, 3.185, 3.527 

Our interpretation of this interval is as follows: Given a trade area with annual 
sales of $25,000,000 and a population density of 52 people per square mile, we 
are 95% confident that the weekend circulation of daily newspapers in the area is 
between 3185 and 3527. We say this because we know that if we fit a regression 
plane to sample data from this population repeatedly and predict circulation in a 
trade area in a similar manner, we expect 95% of such intervals to include the 
trade area’s true circulation. 


Estimating the 
Mean of a 
Subpopulation of 
Y Values 


To get a point estimate of the mean of a subpopulation of Y values, we substitute 
the X values into the sample regression equation. The numerical value of the 
estimate will be the same as that of the prediction. 

The 100(1 — a)% confidence interval for the mean of a subpopulation of 
Y values for the three-variable case is given by 


— I— u/2,h 


_ i- — i 


I)‘\v. 12 


1 

- + 
n 


CnX 


1 b v 1 j 


^22^2 'j 


+ 2c 


■\2X\jX2j 00 . 5 . 2 ) 


Suppose, for our newspaper-circulation example, that we wish to estimate the 
mean circulation for all trade areas with total retail sales of $25,000,000 and a 
population density of 52 persons per square mile. As we have seen, the point 
estimate is 3356. Equation 10.5.2 shows the 95% confidence interval for this 
subpopulation mean to be 


3.356 ± 2.0739(0.0768) 

x + (0.0595529)(-4.58) 2 + 0.0154282(- 11.064) 2 
+ 2( —0.0298493)( —4.58)(— 11.064) 

3.356 ± 0.062, 3.294, 3.418 


We interpret this interval as follows: We are 95% confident that the mean of 
the subpopulation of Y for the specified combination of X’s is between 3294 and 


Use of Computers 


Exercises 


3418. We can say this because, if we repeated the process of fitting a regression 
plane to sample data from this population many times and constructed confidence 
intervals in the way described, in the long run 95% of these intervals would include 
the true mean. 

We may generalize these formulas for confidence intervals and prediction in¬ 
tervals to accommodate any number of independent variables. [Anderson and 
Bancroft (1952) give general formulas.] 

Chapter 9 mentioned the advantages of using computers in simple linear-regression 
analysis. These advantages are even greater in multiple-regression analysis. As 
the number of independent variables increases, the number and complexity of the 
calculations increase very rapidly. If the only computational aid you have is a 
desk calculator or a hand-held calculator, you can easily become discouraged when 
you have to carry out a regression analysis involving several independent varia¬ 
bles. On the other hand, using a computer lets you include many independent 
variables in the analysis when other considerations suggest their inclusion. With 
a computer, you can escape the drudgery of monotonous calculations. You can 
redirect your energies to the more interesting and challenging jobs of improving 
the quality of the data and the proper interpretation of the analysis. Canned pro¬ 
grams, or programs that are written in-house, can print out the analysis in concise 
summary form. Figure 10.5.1 shows such a printout. The input for the analysis 
was the data of Example 10.3.1, shown in the second, third, and fourth columns 
of Table 10.3.1. 

On the computer printout, the dependent variable Y and the independent vari¬ 
ables Xj and X 2 are referred to as variables 1,2, and 3, respectively. Also re¬ 
member, for example, that .681591E-01 is read as 0.0681591. The correlation 
matrix on the printout shows the simple correlation coefficients for each pair of 
variables. The simple correlation between Y and X l9 for example, is 0.987518, 
and for Y and X 2 , r — 0.984833. The correlation coefficients are repeated below 
the diagonal of l’s, but they disagree slightly with those above the diagonal 
because of rounding error. 

The “Actual versus calculated” values are the observed and calculated values 
of Y, For each observed (actual) value of Y (y 7 ), the computer substitutes the 
accompanying observed values of X, and X 2 into the regression equation to obtain 
y, the calculated value of Y. The “Difference” column contains the residuals 
yj — $j. The analysis-of-variance table appears near the end of the computer 
printout. Finally the computer does the calculations necessary to test the null 
hypothesis that each slope coefficient (ft) is equal to 0. 

The results shown on the computer printout differ from those given earlier in 
the text because of rounding errors. 

Construct 95% confidence and prediction intervals for specified X,. 

10.5.1 Refer to Exercise 10.3.1 and let Xy = 40 and x 2j = 50. 

10.5.2 Refer to Exercise 10.3.2 and let Xy = 5 and x 2J = 1. 

10.5.3 Refer to Exercise 10.3.3 and let x {j = 0.2 and x 2j = 60. 
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10.6 THE MULTIPLE-CORRELATION MODEL 


The Coefficient 
of Multiple 
Correlation 


We have shown that the purpose of correlation analysis is to assess the strength 
of the relationship among several variables. The model that we assume in multiple- 
correlation analysis is the same as that given by the multiple-regression model of 
Equation 10.2.1. There is an important difference, however, in the assumptions 
underlying the two models. 

In the correlation model, we assume that all the variables, not just Y, are 
random. If inferential procedures are an objective of the analysis, we must assume 
that the joint distribution of the variables is normal. We call such a distribution a 
multivariate normal distribution. To show all the descriptive and inferential meth¬ 
ods, we assume a multivariate normal distribution in the exercises and examples 
that follow. 

All the other assumptions specified in Section 10.2 also apply to the correlation 
model. However, the concept of dependent and independent variables is not ap¬ 
propriate under the correlation model. 

When the correlation model applies, we can carry out all the analyses we have 
discussed so far in this chapter. In addition, under this model, we can extend the 
analysis in two ways: (1) we can estimate the overall strength of the correlation 
between variables, and (2) we can assess the strength between combinations of 
selected variables, when the effect of all other variables has been removed. 


We can give the quantity R 2 12 k under the correlation model an additional in¬ 
terpretation. We call the square root of R 2 A2 ...k the sample coefficient of multiple 
correlation and interpret it as a measure of the correlation between sample values 
of Y and all the A,. We may also think of R y \ 2 ...k as a measure of the correlation 
between observed values of Y and calculated values of Y. An estimate of the 
population coefficient of multiple correlation, p yA2 ...k* is provided by R yA2 ...k- If 
we wish to know if sample data provide sufficient information to justify the 
conclusion that a set of variables is correlated, we test the null hypothesis that 
Py.u...k ~ 0 against the alternative that p y l2 k # 0. The test statistic is 


fly, 12 ...k . n - k - \ 

I _ Ry.l2...k k 


( 10 . 6 . 1 ) 


When the null hypothesis is true, this statistic is distributed as F with k and n - 
k — 1 degrees of freedom. This test is exactly equivalent to the F test for overall 
regression discussed in Section 10.4. 

In our newspaper-circulation example, let us assume that the correlation model 
applies. This is not an unreasonable assumption, as the sample of trade areas was 
randomly selected (we did not single out specific values of the A’s for study). 
There seems to be no reason to believe that there’s anything inherent in any of 
the variables that would keep them from being random. 

We may use the data from this example, then, to illustrate the calculation of 
the coefficient of multiple correlation and the test of the null hypothesis that 





Partial Correlation 


p y 12 = 0. We have already found the coefficient of multiple determination for 
this example to be 0.9827 ~ 0.98. The square root of this gives 

R v 12 = VoM = 0.99 

The test statistic for the hypothesis test, from Equation 10.6.1, is 

(0.9827)(22) 


(1 - 0.9827)(2) 


624.84 


We definitely reject the hypothesis, since the probability of obtaining a value of 
F as large as or larger than 624.84 when the null hypothesis is true is extremely 
small. Note that, because of rounding errors, the value of F differs from that 
obtained in the analysis-of-variance table. 

In multiple-correlation analysis, we often want to be able to compute some measure 
of the contribution of individual variables when they are considered one at a time 
with the other variables held constant. We call such measures partial correlation 
coefficients. In computing the partial correlation coefficient between two variables, 
we want to eliminate the influence of all the other variables. 

Before we compute the partial correlation coefficients, we must first compute 
simple correlation coefficients . For the three-variable case, there are three simple 
sample correlation coefficients, as follows: 

r yU the simple correlation between Y and X { 

r y2 , the simple correlation between Y and X 2 

r 12 , the simple correlation between X x and X 2 

We can compute the simple correlation coefficients in terms of deviations from 
means, as follows: 

r * = vSpf 

Zx'yy'j 

r * = vmw 

1x[jX' 2j 

r 12 = / ■■■■ „ . = (10.6.4) 

VZtJ? Ixfi 

We can calculate these coefficients using the data of the newspaper-circulation 
example. We can get the numerical values we need from Sections 10.3 and 10.4. 


(554.74)(7.53) 


= 0.99 


125.07 


V(2141.30)(7.53) 
1073.27 


0.98 

= 0.98 


12 V(554.74)(2141.30) 

The partial correlation coefficients that we can compute now are: 

1. The partial correlation between Y and X { when X 2 is held constant: 

fryi ~ r y2 r l2 ) 

r '"' 2 ~~ V(1 - r? 2 )( 1 - rfj) 

2. The partial correlation between Y and X 2 when X, is held constant: 

(r y 2 ~ r yl r l2 ) 


y2A V(1 - /•(:,)( 1 - rf 2 ) 

3. The partial correlation between Xj and X 2 when Y is held constant: 


_ fri 2 ~ r yX r y2 ) 

,2 > V(1 - r^)( 1 - r 2 y2 ) 


Using the newspaper-circulation data, we have 

[0.99 - (0.98)(0.98)] 


yl ' 2 V(1 - 0.98 2 )(1 - 0.98 2 ) 
[0.98 - (0.99X0.98)] 
y2J V(1 - 0.99 2 )(1 - 0.98 2 ) 
[0.98 - (0.99X0.98)] 


12. v 


V(1 - 0.99 2 )(1 - 0.98 2 ) 


= 0.75 

= 0.35 

= 0.35 


( 10 . 6 . 5 ) 


( 10 . 6 . 6 ) 


( 10 . 6 . 7 ) 


For any one of these partial correlation coefficients, we may use the t test to 
test the null hypothesis that the corresponding population partial correlation coef¬ 
ficient is 0. For example, to test H 0 : p yX 2 k - 0, we compute 

I n — k — 1 


t = r 


yl.2 ...k 


vl .2 ...k 


We compare the computed t with the critical value of t corresponding to the chosen 
level of significance and n - k — 1 degrees of freedom. 

We can use the newspaper-circulation data to illustrate the procedure. 

Pyl.2 ~ 0, H 1 : Pyl.2 ^ 0 


Let a = 0.05. 
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125 -2-1 

' = °- 75 = 532 

Since 5.32 > 2.074, the tabulated value of t for a = 0.05 and 22 degrees of 
freedom, we reject the null hypothesis (p < 0.01). 

We have limited this illustration to the three-variable case under the correlation 
model. But the concepts and techniques of analysis extend logically to four or 
more variables. Needless to say, the complexity of computation increases greatly 
as the number of variables increases. 



In each of the following exercises: (a) Compute the multiple-correlation coefficient, (b) 
Test H 0 that p vj2 = 0. (c) Compute the partial correlation coefficients, (d) Test H 0 that 
p y i 2 - 0. (Let a = 0.05 for all tests and determine the p value.) 

10.6.1 A lumber company hires forestry specialists to study timber yield on 10 plots of 
ground of equal size. Three measurements are taken on each plot: age of trees in years 
(X,), a measure of fertility (X 2 ), and yield per acre (T). The following data are collected. 


Plot 

1 


2 

3 

4 

5 

6 

7 

8 

9 

10 

Totals 

x 1 

15 


20 

24 

38 

44 

48 

56 

62 

63 

64 

434 

x 2 

60 


76 

84 

85 

86 

85 

86 

86 

87 

87 

822 

Y 

0.8 


1.2 

2.3 

3.2 

4.3 

4.8 

5.0 

5.5 

6.3 

7.2 

40.6 



2* 

2 _ 
V ~ 

21,930 

Ivf- 

205.92 


2*2/// = 

3459.7 






2* 

2j — 

68,208 

2*i iYj 

= 2111. 

1 

2*1,*2/ = 

= 36,727 






10.6.2 In a study of urban housing, the following measurements are made on 15 low- 
income residential areas. Here Xj denotes average family income (x $100), X 2 denotes 
number of miles from the central city, and Y denotes average monthly rental for a one- 
bedroom dwelling. 


Area 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

Totals 

X 1( $ 

34 

37 

39 

42 

41 

45 

40 

52 

50 

62 

68 

65 

70 

68 

75 

788 

x 2 

7.5 

6.3 

5.0 

3.5 

4.8 

4.2 

4.5 

3.8 

4.0 

3.4 

3.9 

3.0 

4.0 

3.8 

2.5 

64.2 

Y, $ 

120 

130 

135 

138 

142 

149 

155 

158 

160 

169 

170 

175 

182 

190 

195 

2,368 


2* ; 

If = 

44,162 


Zy? 

= 380,858 


2*2 iV, = 

9822.4 






2*i 

\i = 

297.02 


IxyYl = 

128,592 

2*1 i X 2i = 

= 3190.4 






|gS9l 10.6.3 A city treasurer is studying factors thought to be related to per capita public ex- 
rim penditures. The treasurer collects data for 11 wards in the city. In the following table, the 

. wards are numbered 1 through 11. F denotes annual per capita public expenditure (x 

$1000), X, denotes population density in persons per square mile, and X 2 denotes annual 
per capita income (x $10,000). 



1 2 

3 

4 

5 6 

7 

8 9 

10 

11 

Totals 

Y 

0.02 0.06 

0.04 

0.05 0.03 0.05 

0.04 

0.05 0.03 

0.04 

0.03 

0.44 


133 7 

61 

15 

32 32 

19 

12 64 

29 

99 

503 

x 2 

1.6 1.9 

1.8 

2.0 

1.5 2.0 

1.7 

1.5 2.1 

1.4 

2.2 

19.7 


2>?,- - 38,975 

ly? = 

0.019 

2*2 iYj 

= 0.791 





24 - 36.01 

5>i jYi 

- 16.24 

2*1,*2/ 

= 921 






10.7 CHOOSING INDEPENDENT VARIABLES FOR 
THE REGRESSION EQUATION 


When we use regression analysis, we must decide which independent variables to 
include in the model. Usually, we base the decision on both statistical and non- 
statistical considerations. We may have to omit some variables because of the 
difficulty or expense of obtaining measurements. We may also omit some variables 
because the results of statistical analyses cast doubt on their usefulness as predic¬ 
tor, or explanatory, variables. Some writers suggest that under some circumstances 
variables that fare poorly when subjected to statistical evaluation should remain 
in the model, either because measurements on them are easily obtained or because 
the logic of their presence is so strong. 

Statistical criteria for excluding variables usually center around the magnitude 
and significance of R 2 and the /?,’s. Ideally we would regress Y on each subset of 
the independent variables and evaluate the results against the appropriate statistical 
criterion. This procedure would involve regressing Y on each independent variable 
alone, then on each possible pair, then each possible triplet, and so on. If there 
are many independent variables, this can be time-consuming. 

Two alternative approaches are the step-up procedure and the step-down pro¬ 
cedure. In the step-up procedure, we regress Y on each independent variable 
separately. We retain the variable yielding the highest R 2 . Next we regress Y on 
all possible pairs of variables that we can form from the variable retained in the 
first step and the k - 1 other variables. We retain the pair yielding the largest R 2 
value. We continue this way until the R 2 we obtain is not significantly larger than 
the R 2 we obtained in the preceding step. 

In the step-down procedure, we regress Y on all independent variables at once. 
We omit the least significant variable and regress Y on all the k — 1 remaining 
variables at once. We omit the least significant variable from these k — 1 varia¬ 
bles. We continue this way until all retained variables are contributing significantly 
to the regression. 

Each time an independent variable is added to the regression model, R 2 is 
increased to some extent. You should not, however, take this as a license to 
include every possible independent variable in the model. The magnitude of the 
increase in R 2 resulting from the addition of an independent variable may not 
justify the costs involved in collecting data on that variable. 

Cohen and Cohen (1975) discuss some of the limitations of stepwise procedures. 

Stepwise methods are laborious if the number of independent variables is large. 
We can save much time and effort if we carry out the procedures on a computer. 
[Various methods for obtaining the best subset of independent variables are dis¬ 
cussed in the papers by Allen (1971), Beale et al. (1967), Garside (1965), Gorman 
and Toman (1966), Hocking (1967, 1972), Larson and Bancroft (1963), Lindley 
(1968), Schultz and Goggans (1961), and Summerfield and Lubin (1951). The 
books by Draper and Smith (1981), Neter and Wasserman (1974), Smillie (1966), 
and Sprent (1969) also discuss this topic.] 


10.8 ADDITIONAL TOPICS 


Regression and correlation analysis cannot be treated in complete detail in two 
chapters. So we have omitted many facets of these techniques from our discussion 
here. The following are brief summaries of some of the additional topics relating 
to regression and correlation that you can find discussed in greater detail in the 
books and articles that we have already cited. 

Dummy Variables The variables used in the examples and exercises of this chap¬ 
ter and Chapter 9 have all been quantitative. Often, however, one or more of the 
variables is a qualitative or categorical variable. For example, the regression model 
may indicate that we should include such categorical variables as sex, religion, 
membership in an ethnic group, and area of residence as independent variables. 
Since regression-analysis techniques require that all variables be quantitative, the 
question of how to quantify these categorical variables arises. A frequently used 
technique is dummy variable coding. This technique lets us represent membership 
in one of k groups by a series of k - 1 dichotomies. We use scores of 0 and 1 
as codes. An object or subject belonging to a specified group or category receives 
a score of 1, and all subjects or objects not belonging to that group receive a 
score of 0. 

Consider, for example, a study in which marital status is included as an inde¬ 
pendent variable in a multiple-regression equation designed to model consumer 
behavior. The possible categories are Married, Single, Widowed, Divorced, and 
Other. We could use the coding scheme shown in Table 10.8.1. Thus, for ex¬ 
ample, when data are collected on a sample of consumers for model-testing pur¬ 
poses, a single consumer would receive a score of 1 on independent variable X 1 
and a score of 0 on variables X 2 , X 3 , and X 4 . We can use the techniques described 
in this chapter and Chapter 9 to analyze sample data coded in this manner. The 
books by Cohen and Cohen (1975), Draper and Smith (1981), Kerlinger and 
Pedhazur (1973), Mendenhall and McClave (1981), and Neter and Wasserman 
(1974) include good discussions of the use of dummy-variable coding in regression 
and correlation analysis. Daniel (1979) has prepared a bibliography on dummy 
variables. 

Curvilinear Regression We have limited our discussion of regression and corre¬ 
lation in Chapters 9 and 10 to linear models. In these models, as we have seen, 


TABLE 10.8.1 
Dummy-variable 
coding of marital 



Variable 



Group 

X, 

x 2 

*3 

*4 

status 

K ! = married 

1 

0 

0 

0 


K 2 — single 

0 

1 

0 

0 


K 3 = widowed 

0 

0 

1 

0 


K 4 = divorced 

0 

0 

0 

1 


K s = all other 

0 

0 

0 

0 


the dependent variable is expressed as a linear combination of the independent 
variable or variables. In many cases, a linear model does not adequately show the 
underlying relationships among the variables. The situation may be better por¬ 
trayed by some type of curve rather than a straight line. 

For example, the yield of a crop may increase with increasing applications of 
fertilizer up to a point, after which there is a decrease in yield. This type of 
relationship suggests a parabola as an appropriate model. A nonlinear multiple- 
regression model may be appropriate if we expect to add other independent var¬ 
iables, such as rainfall and temperature. Figure 10.8.1 shows some examples of 
curves that may serve as appropriate nonlinear models in regression and correlation 
analysis. 

When a nonlinear model applies, we may use least-squares methods to obtain 
estimates of the population regression coefficients. The techniques are logical 
adaptations of those discussed in this chapter. For detailed discussions of regres¬ 
sion analysis in the nonlinear case, consult the books by Ezekiel and Fox (1959), 
Neter and Wasserman (1974), Smillie (1966), and Sprent (1969). 

Transformations When the data available for analysis do not fit a linear model, 
we may look at a more complicated model, as explained in the preceding para¬ 
graphs. Often we may wish to avoid a more complicated model because of the 


FIGURE 10.8.1 
Examples of 
nonlinear curves 
that may serve as 
models in 
regression analysis 



y = b 0 + + b 2 x 2 + 6 3 x 3 



y - b 0 + b x x + b 2 x* 



Summary 


problems its complexity may bring about. An alternative procedure is to use some 
transformation on the data that will make the linear model appropriate. The nature 
of the original data—that is, those characteristics that make them unsuited for the 
application of a linear model—determine the type of transformation that is correct. 
The following are some examples of transformations that we may use. 

1. Logarithmic transformations are appropriate when the true model is multipli¬ 
cative. For example, if the true model is 

y = Poffie 

we may take the logarithm of both sides of the equation to obtain the linear model 
log,ay = log,oA, + X log,o/3, + log l0 e 

2. Square-root transformations are used when the data are the result of a Poisson 
process. Recall that the mean and variance are equal in a distribution that follows 
the Poisson law. Thus changes in the mean value of a Poisson-distributed de¬ 
pendent variable are accompanied by like changes in the variance. This violates 
the assumption of equal variances of the subpopulations of Y values for a given 
X. The square-root transformation tends to equalize the variances. 

3. Reciprocal transformations are appropriate when the true model is of the form 

1 

^ Po + P\X + e 

If we take the reciprocal of both sides of this equation, we have the linear equation 

■ = A) + /V + e 
y 

The books by Cohen and Cohen (1975), Draper and Smith (1966), and Neter and 
Wasserman (1974) discuss transformations in more detail. 

Multicollinearity One of the problems that may arise in regression analysis is 
multicollinearity. We say that multicollinearity exists between two (or more) var¬ 
iables when the relationship between (among) them is perfectly (or almost per¬ 
fectly) linear. When multicollinearity is present, estimates of parameters tend to 
have very large standard errors. They are also highly dependent on the particular 
data points observed in the sample. [There are methods for dealing with multi¬ 
collinearity. You will find them discussed in the book by Neter and Wasserman 
(1974). Daniel (1980) has prepared a bibliography on a technique for handling 
multicollinearity.] 

This chapter discussed the concepts and techniques of multiple-regression analysis 
and multiple-correlation analysis. It showed the differences between the assump¬ 
tions underlying the two models. 

We used an example to illustrate descriptive and inferential procedures. We 
suggested that regression analysis should include the following steps: 

1. Specification of the model 



Review Questions 


a 


2. Review of the assumptions 

3. Obtaining the equation 

4. Evaluating the equation 

5. Using the equation 

You learned that the multiple-regression and multiple-correlation models are 
straightforward extensions of the simple linear models. For each additional in¬ 
dependent variable, we add an additional term to the basic model. 

The assumptions for multiple regression and correlation are also extensions of 
those for the simple linear models. The geometric interpretations of the model 
involve planes and hyperplanes, rather than straight lines. 

We can estimate the parameters in the multiple-regression and multiple-corre¬ 
lation models by the method of least squares. When the assumptions are met, we 
evaluate the resulting equation by applying analysis of variance. We use t tests to 
evaluate the individual regression coefficients. 

A multiple-regression equation, like a simple linear regression equation, is used 
for prediction and estimation. 

In correlation analysis we are often not interested in the equation. We are 
interested in the strength of the relationship among the variables as measured by 
the multiple-correlation coefficient and the various partial correlation coefficients. 

In addition to the references already cited, we suggest the articles by Jaech 
(1966), Newton and Spurrell (1967), and Weiss (1970) for further reading. 

1. Write out the multiple-regression model and explain each component. 

2. State the assumptions underlying the multiple-regression model. 

3. Explain fully the following terms: (a) coefficient of multiple determination, (b) mul¬ 
tiple-correlation coefficient, (c) simple correlation coefficient, (d) partial correlation coef¬ 
ficient. 

4. Explain the differences between the assumptions of the regression model and those of 
the correlation model. 

5. Explain the difference between a prediction interval and a confidence interval. 

6. What is a multivariate normal distribution? 

7. Describe a situation in your area of interest in which multiple-regression analysis 
would be useful. Use real or realistic data and do a complete regression analysis. 

8. Describe a situation in your area of interest in which multiple-correlation analysis 
would be useful. Use real or realistic data and do a complete correlation analysis. 

9. An industrial psychologist conducts a study to examine those variables thought to be 
related to on-the-job performance of technical employees. A random sample of 15 em¬ 
ployees gives the following results, (a) Find the multiple-regression equation describing 
the relationship among these variables, (b) Compute the coefficient of multiple determi¬ 
nation and do an analysis of variance. In the table, Y denotes the employees’ job perform¬ 
ance ratings, X l denotes their scores on a job aptitude test, and X 2 denotes the number of 
in-service training units earned. 


Y 54 37 30 48 37 37 31 49 43 12 30 37 61 31 31 

Xt 15 13 15 15 10 14 8 12 1 3 15 14 14 9 4 

X 2 81 17423791 12 10 15 


10. In a study of job satisfaction among assembly-line employees of a large manufacturing 
company, you collect the following data for a sample of 12 employees. Compute R y n 
and test for significance at the 0.05 level. Determine the p value. In the table, Y denotes 
the employee’s score on a job-satisfaction measure, X x denotes the supervisor’s score on 
a job-satisfaction measure, andX 2 denotes the employee’s self-confidence score. 


Y 

45 

35 

35 

40 

55 

50 

38 

55 

38 

45 

70 

60 


39 

40 

40 

42 

45 

43 

44 

47 

49 

48 

50 

50 

x 2 

51 

51 

55 

57 

57 

61 

65 

64 

63 

61 

65 

68 


11. A personnel director with a large firm wants to know whether level of skill in a certain 
job can be predicted, using as predictor variables the age and experience of the employee. 
The following data are obtained on a random sample of 15 employees, (a) Find the least- 
squares multiple-regression equation, (b) Compute R 2 yA2 anc * test for significance at the 
0.05 level. Determine the p value for the test, (c) Test H 0 : ft = 0 and H 0 : ft = 0. Let 
a = 0.05. Compute the p value for each test, (d) Compute a 95% confidence interval for 
ft. (e) Let x 1 = 2 and x 2 = 25 and compute y. (f) Find the 95% prediction interval for 
Y. (g) Find the 95% confidence interval for the mean of Y when x x = 2 and x 2 = 25. 


Skill level 

m 

Experience 

(X,) 

Age 

(X 2 ) 

Skill level 

(Y) 

Experience 

(XO 

Age 

(X 2 ) 

15 

0 

21 

30 

2 

25 

15 

0 

18 

45 

2 

38 

21 

0 

22 

50 

3 

44 

28 

1 

24 

60 

3 

51 

30 

1 

25 

45 

4 

39 

35 

1 

25 

60 

4 

54 

40 

1 

26 

50 

5 

55 

35 

2 

34 





12. The following table shows the tensile strengths Y of 10 specimens of plastic. The 
independent variables are the amounts of two ingredients included in the formula. (X x is 
the amount of ingredient 1 and X 2 is the amount of ingredient 2.) (a) Find the least-squares 
multiple-regression equation, (b) Compute R 2 l2 and test for overall regression, using 
ANOVA. Let a = 0.05. (c) Test H 0 : ft = 0 and H 0 : ft = 0. Let a = 0.05, and 
determine the p value for each test, (d) Suppose that the amounts of ingredient 1 and 
ingredient 2 added to the formula are 5 and 3, respectively. Construct a 95% prediction 
interval for the tensile strength of a single specimen of plastic, (e) Using the values of x x 
and x 2 specified in (d), construct a 95% confidence interval for all specimens of plastic 
produced. 


Y 1361 1588 1815 2087 2268 2404 3402 3629 3765 4083 

X, 8745543321 

X 2 43432221 1 1 


13. The following table shows the end-of-month balance (T) of 15 young couples with 
charge accounts at a large department store. The independent variables are number of years 
married (Xj) and age of wife (X 2 ). (a) Do a complete regression analysis of these data. 
Let a = 0.05, and determine the p value for each test, (b) Let x ] = 2 and x 2 = 20 and 




construct a 95% prediction interval for Y. (c) Using the values of x Y and x 2 specified in 
(b), construct the 95% confidence interval for the mean of Y. 


End-of- 

month 

balance 

{Y) 

Number of 
years 
married 
(*,) 

Age of 
wife 
<X 2 ) 

End-of- 
month 
balance 
( Y ) 

Number of 

years Age of 

married wife 

(Xi) (X 2 ) 

110 

1 

25 

98 

3 

25 

115 

1 

24 

99 

4 

30 

120 

1 

22 

98 

4 

24 

118 

1 

24 

100 

5 

29 

110 

2 

20 

90 

5 

30 

108 

2 

20 

93 

5 

30 

105 

2 

20 

90 

6 

28 

104 

3 

24 




14. The following data arc based on a survey conducted in 10 market areas. The dependent 

variable is the proportion of adults who say they prefer a certain brand of toothpaste. The 

independent variables are per capita income (Xj) and a measure of the educational level 

(X 2 ) of residents of the market areas, (a) Do a complete regression analysis of these data. 

Let a = 

0.05, and determine the p value for each test, (b) Let x ] 

= 5 and x, = 6 and 

construct the 95% prediction interval for Y. (c) Using the same values of Xj and x 2 specified 

in (b), construct the 95% confidence interval for the mean of Y. 



Proportion 

Per capita 


Index of 

Market 

preferring the 

income 


education 

area 

brand (V) 

(x 5000)(Xi) 


(X 2 ) 

1 


61.6 

6.0 


6.3 

2 


53.2 

4.4 


5.5 

3 


65.5 

9.1 


3.6 

4 


64.9 

8.1 


5.8 

5 


72.7 

9.7 


6.8 

6 


52.2 

4.8 


7.9 

7 


50.2 

7.6 


4.2 

8 


44.0 

4.4 


6.0 

9 


53.8 

9.1 


2.8 

10 


53.5 

6.7 


6.7 

15. In a 

study of the sales of a certain product by market area, a 

marketing researcher 

collects the following data on a sample of market areas. 

All observations are coded for 

ease of calculation, (a) Find the sample multiple-correlation coefficient and test the null 

hypothesi 

s that p v 12 = 

0. (b) Find each of the partial correlation coefficients and test each 

for significance. Let a 

= 0.05 for all tests, (c) Determine the p value for each test, (d) 


State your conclusions. 


Sales 

O') 

Advertising 

expenditures 

(XT 

Market 

share 

(X 2 ) 

Sales 

(Y) 

Advertising 

expenditures 

(X,) 

Market 

share 

(X 2 ) 

149.0 

21.00 

42.50 

169.0 

24.92 

48.95 

152.0 

21.79 

43.70 

172.0 

25.50 

49.90 

155.7 

22.40 

44.75 

174.5 

25.80 

50.30 

159.0 

23.00 

46.00 

176.1 

26.01 

50.90 

163.3 

23.70 

47.00 

176.5 

26.15 

50.85 

166.0 

24.30 

47.90 

179.0 

26.30 

51.10 



16. An education researcher collects the following data on a sample of 11 college seniors 
majoring in accounting, (a) Compute the multiple-correlation coefficient and test it for 
significance, (b) Compute each of the partial correlation coefficients and test r l2y for 
significance. Let a = 0.05 for all tests, (c) Determine the p value for each test, (d) What 
are your conclusions? In the table, Y denotes students’ scores on a verbal reasoning test, 
X } their scores on a mathematics aptitude test, and X 2 their IQ scores. 


Y 

162.2 

158.0 

157.0 

155.0 

156.0 

154.1 

169.1 

181.0 

174.9 

180.2 

174.0 

X , 

51.0 

52.9 

56.0 

56.5 

58.0 

60.1 

58.0 

61.0 

59.4 

56.1 

61.2 

X 2 

108 

111 

115 

116 

117 

120 

124 

127 

122 

121 

125 


17. In a study of variables thought to be related to scholastic aptitude, a team of education 
researchers collects data on a sample of 15 students in their first year of college. The 
variables are as follows. 

Y = scholastic aptitude scores of students entering college 

X l = pupil-teacher ratio in high school from which student graduated 

X 2 = median family income of student’s community of residence 

X 3 = size of high school graduating class 

X 4 = student’s high school grade-point average 

The following is a partial computer printout of the results of the multiple-regression 
analysis. Prepare a verbal interpretation of these results. 



M U L T I V A 

R I A T E 

C U R V E F I 

T 

VARIABLE 

REGR COEFF 

MEAN VALUE 

STD DEV 


1 (CONSTANT 

= 2.088G ) 

10.0887 

2.40743 


2 

-♦244325E-02 

30.2888 

8.75783 


3 

♦ 175112 

12.4 

8.80102 


a 

.253181 

4.53333 

2.9409 


5 

1.72182 

2.73333 

.771729 


INDEX 

OF DETERMINATION 

(R-S0) = . 

91937 


CORRELATION 

MATRIX 




i 

-.771389 

.875112 

♦420715E- 

01 .870788 

-♦771389 

1 

- .88883 

-.171498 

-.889374 

, 875112 

-.888831 

1 

-.241071 

.845418 

♦420715E-01 

- . 171495 

- .241071 

1 

-.289815 

♦870788 

-.889374 

.845418 

-.289815 

1 

DO YOU WISH 

TABLE OF VALUES 

PRINTED? YES 



A C T U A 

L VS C 

A L C U L A T 

E D 

ACTUAL 

CALCULATED 

DIFFERENCE 

PCT DIFFER 


14 

14.9571 

-.957058 

- 8 . 

3 

13 

11.4745 

1.5254G 

13.2 


6 

8.85242 

- .852423 

-12 

.4 

11 

11.8923 

-.892345 

-7. 

5 

10 

9.28295 

.717051 

7.7 




Y = annual sales (in $10,000 units) 

X x = sales experience (in years) 

X 2 = score on a sales aptitude test 

A partial computer printout of the results of multiple-regression analysis follows Prepare 
a verbal interpretation of these results. 



M U L T I V A 

R I A T E C 

U R V E F I T 

VARIABLE 

REGR COEFF 

MEAN VALUE 

STD DEV 

1 (CONSTANT 

= .395081 ) 

20.5 

7.00388 


.257827 

7.79999 

3.88241 

3 

2.98821 

8.09999 

1.81387 

INDEX 

OF DETERMINATION 

(R-SQ) = .752G 


CORRELATION 

MATRIX 



i 

♦SGB929 

.881978 


♦ GGG93 

1 

♦891797 


♦ 8G198 

.891897 

1 


DO Y0IJ WISH 

TABLE OF VALUES 

PRINTED? YES 



ACTUAL 

VS CALC 

U L A T E D 

ACTUAL 

CALCULATED 

DIFFERENCE 

PCT DIFFER 

22 

1G.7731 

5.22893 

31.1 

1 5 

13.0334 

1.98881 

15 

18 

13.8089 

4.19312 

30.3 



10 

15.999S 

-5,9996 

-37,4 

is 

19.4814 

-3,48145 

-17,8 

29 

26,1873 

2.81265 

10.7 

35 

33 , 6667 

1,33327 

3.9 

15 

18*32 

-3,32004 

-18,1 

? 

23,479 

-1,47897 

-6,2 

23 

24,2525 

-1.25246 

-5,1 


A N A L Y S 

IS OF 

VARIANCE 

SOURCE 

SS 

DF 

MS 

REGRESSION 

369,162 

2 

184,581 

ERROR 

121,354 

7 

17.3362 

TOTAL 

490,516 

9 


*SIGNIFI CANT 

AT VI LEVEL 



VARIABLE 

COEFFICIENT 

STD ERROR 

T STATISTIC 

2 

.257827 

,495103 

,520753 

3 

D,F, = 7 

2,96621 

1,00514 

2,95103 


19. A health research team collects data on 10 communities. Measurements are obtained 
on the following variables: 

Y = health care facility utilization index 
X, = median family income 
X 2 = proportion of workers with health insurance 
X 3 = doctor-population ratio 

A partial computer printout of the results of multiple-regression analysis follows. Prepare 
a verbal interpretation of these results. 

MULTIVARIATE C U R V E F I T 


VARIABLE 

REGR COEFF 

MEAN VALUE 

STD DEV 


1 (CONSTANT 

= 7,62552 ) 

20,1 


6,90586 


2 

,621658 

8,19999 


4,42268 


3 

16.9724 

,574999 


,202794 


4 

-,313452 

7.59999 


4,07923 


INDEX 

OF DETERMINATION 

(R-SO) = 

,814088 


CORRELATION 

MATRIX 





1 

,804793 

.805102 


-,49909 


,804794 

1 

, G4GG84 


- ,455618 


,805102 

,646684 

1 


-,2G5937 


-,499089 

-,455618 

-,265937 


1 


DO YOU WISH 

TABLE OF VALUES 

PRINTED? 

YES 




ACTUAL 

V s c 

A L 

C U L A T 

E D 

ACTUAL 

CALCULATED 

DIFFERENCE 

PCT DIFFER 

20 

12,691 

7.309 


57,5 


15 

15,5608 

-,560793 


- 

3,6 

17 

19,6529 

-2,65291 


- 

13,4 

10 

13,1827 

-3,18268 


- 

24,1 

1G 

19,026 

-3,02602 


- 

15,9 


28 

2S ♦ 7831 

1.20B92 

4.5 

35 

34.1717 

.828293 

2.4 

15 

IS.3832 

-1.38318 

-8.4 

22 

22.5882 

-.58822B 

-2.B 

23 

20.9502 

2.04979 

9.7 


A N A L Y S 

IS OF 1 

VARIANCE 

SOURCE 

SS 

DF 

MS 

REGRESSION 

388.24B 

3 

129.415 

ERROR 

88.BB3E 

S 

14.7773 

TOTAL 

47S.91 

9 


♦SIGNIFICANT 

AT 51 LEVEL 



VARIABLE 

COEFFICIENT 

STD ERROR 

T STATISTIC 

2 

.B21B5B 

.390581 

1.591B2 

3 

IS.9724 

7.8B578 

2.15774 

4 

-.313452 

.33507 

-.935481 


d . f « = e 

20. A sample survey is conducted to study the relationship between the amount of life 
insurance held by business executives and several other variables. The following data are 
collected, (a) Find the least-squares multiple-regression equation, (b) Compute /?(; 123 and 
test for significance at the 0.05 level. Determine the p value, (c) Test H 0 : fa = 0, // 0 : 
fa = 0, H () : fa = 0. Let a = 0.05, and determine the p value for each test. In the table, 
Y denotes the amount of life insurance held by the executives (x $10,000), X x denotes 
their annual incomes (x $1000), X 2 denotes the number of their children, and X 3 denotes^ 
their ages. 


Y 

30 

60 

75 

70 

50 

50 

20 

40 

45 

50 

50 

20 

20 

30 

25 

X , 

25 

42 

37 

57 

52 

48 

28 

53 

51 

52 

54 

32 

29 

29 

25 

X 2 

2 

3 

3 

3 

4 

2 

1 

4 

3 

2 

4 

2 

1 

1 

1 

*3 

31 

35 

42 

60 

58 

49 

28 

55 

52 

35 

38 

41 

31 

32 

30 


21. Select a simple random sample of size 40 from the population of employed heads of 
households in Appendix II. Do a complete multiple-regression analysis of the data, using 
size of residence as the dependent variable and age, salary, education, length of time with 
current employer, and size of family as the independent variables. 



Using Regression Analysis to Help Select Store Locations 


Deciding where to locate a new retail store is one of the most important 
decisions that the managers of such firms have to make. Lord and Lynds* 

*Dennis J. Lord and Charles D. Lynds, "The Use of Regression Models in Store Location Research: A Review 
and Case Study," Akron Business and Economic Review , 12 (Summer 1981), 13-19. 


review the use of regression techniques in research on the location of stores, 
and present a case study of location of a new liquor store. 

Their study used regression techniques to identify location variables related 
to the sales performance of a sample of 16 liquor stores located in a certain 
area. The dependent variable was volume of sales. The predictorVariables were: 
population within 1.5 miles of the store (POP), mean household income of this 
population (MHI), distance from subject store to nearest other liquor store 
(DIS), daily volume of traffic on the street on which the store was located (TFL), 
and amount of employment within 1.5 miles of the store (EMP). They obtained 
the following results. 


Variable 

Regression coefficient 

t 

POP 

0.09460 

5.20 

MHI 

0.06129 

2.98 

DIS 

4.88524 

2.83 

TFL 

-2.59040 

-2.11 

EMP 

-0.00245 

-0.54 

R 2 = 0.69 




What conclusions can you draw from these results? Let a = 0.05. What as¬ 
sumptions are necessary? What other variables do you think might be worth 
considering as predictors of sales volume of a retail store? Why do you think 
the regression coefficient for TFL is negative? 


11. The Chi-Square 

Distribution and the 
Analysis of Frequencies 


Chapter Objectives: In this chapter, you'll learn to 
extend the concepts of statistical hypothesis testing 
involving attributes to cover situations in which sam¬ 
pling is from populations whose elements can be clas¬ 
sified into one of three or more categories. You'll find 
that the testing procedures are simplified if you use 
the chi-square distribution as an approximate sam¬ 
pling distribution to compare observed frequencies 
with those expected under the null hypothesis. After 
studying this chapter and working the exercises, you 
should be able to do the following. 

1. Justify the use of the chi-square distribution as an 
approximate sampling distribution in comparing 
observed frequencies with expected frequencies 

2. Calculate expected frequencies for goodness-of-fit 
tests, tests of independence, and tests of homo¬ 
geneity 

3. Explain how to determine degrees of freedom for 
each of the above tests 

4. Use the chi-square distribution to conduct statistical 
tests of hypotheses for goodness-of-fit tests, tests 
of independence, and tests of homogeneity 




11.1 INTRODUCTION 


For most of the statistical techniques discussed in Chapters 8 through 10, you 
need measurements, such as weight, length, diameter, distance, amount of money, 
or scores on some type of test. But often the data available for analysis consist 
of frequency counts rather than measurements. The following examples illustrate 
the point. 

Suppose that a market analyst wishes to evaluate the effectiveness of three 
different advertising strategies. The analyst selects three groups of 100 people 
each, and exposes each group to one of the three strategies. Effectiveness is 
measured in terms of the proportion in each group that buys the advertised product. 
The pertinent question is whether there are real differences in these proportions. 

In another situation, we may hypothesize that the number of defective items 
coming off an assembly line during each 8-hour shift has a Poisson distribution. 
For several shifts, we count the number of defective items. Do the observed data 
provide sufficient evidence to cause rejection of the hypothesis? 

Or consider the case in which two factories produce some product under sup¬ 
posedly identical conditions. We examine a sample of items from each factory. 
For each sample, we count the number of items that fail to meet certain quality 
standards. Based on these results, should we conclude that the two samples really 
come from two different populations? That is, can we conclude that the factories 
are different? 

These examples have at least two things in common: (1) The raw data consist 
of counts, or frequencies, and (2) under the implied or stated hypotheses, there 
are expected frequencies with which we can compare the observed frequencies. 
Such a comparison can lead to some useful conclusions. 

If the three advertising strategies do not differ in effectiveness, for example, 
we would expect about the same number of people from each group to buy the 
product, all other things being equal. Or if the number of defective items coming 
off an assembly line during an 8-hour shift is in fact Poisson distributed, we would 
expect a sample of shifts to yield data compatible with the hypothesis. Finally, if 
conditions in two factories producing the same product are identical, we would 
expect, in samples of equal size from the two factories, about the same number 
of items to fail to meet the standards. 

The question to be answered in each of these cases is whether the discrepancies 
between observed and expected frequencies are so large as to cast doubt on the 
assumptions that gave rise to the expected frequencies. 

The statistical technique we use to provide answers to such questions is based 
on the chi-square distribution. The technique was introduced in 1900 by Karl 
Pearson. 

11.2 MATHEMATICAL PROPERTIES OF THE 
CHI-SQUARE DISTRIBUTION 

Chapter 6 introduced the chi-square distribution in connection with the construc¬ 
tion of confidence intervals for a population variance. We can express this distri¬ 
bution mathematically as 



u (n/2)-\ e -u/2 


u > 0 


( 11 . 2 . 1 ) 


m 


where u 



and 


e = 2.71828 . . . 


and n is called the degrees of freedom. The x t are normally and independently 
distributed with means /x, and standard deviations <x,. The subscript i on \x and cr 
indicates that it is possible for each observation to be drawn from a different 
population. When we draw all the observations from the same population, we 
drop the subscripts on fi and cr. The mean and variance of the distribution are n 
and 2n, respectively. This distribution is usually designated by the Greek letter 
X 2 (chi squared). 

The chi-square distribution is the sampling distribution that results when we 
draw n values of the normally distributed random variable X repeatedly and at 
random, transform each value of x to the standard normal, and square and sum 
the resulting n variables. Suppose, for example, that the random variable X is 
normally distributed with mean fi and standard deviation cr. Let us draw a large 
number of independent random samples of size n from this population. Then let 
us transform each value of x within each sample to the standard normal in the 
usual manner, as follows: 


z 


Xj-Jl 


cr 


Let us square this equation and add overall observations within each individual 
sample. Then for each sample we have 



If we make a list of the different values of u along with their relative frequency 
of occurrence, we will have the sampling distribution of w = Dz 2 , which is the 
chi-square distribution with n degrees of freedom. Figure 6.11.1 shows chi-square 
distributions for several degrees of freedom. 

The importance of the chi-square distribution rests on the fact that for large 
samples the quantity 


* 2 = ? 



(11.2.3) 


is distributed approximately as x 2 - In Equation 11.2.3, O t - an observed fre¬ 
quency, Ej — an expected frequency, and k = the number of pairs of observed 
and expected frequencies. Observed and expected frequencies may arise from 
situations such as those in Section 11.1. Later sections will show how we can use 
Equation 11.2.3 to test hypotheses of a practical nature. 


n 


(11.2.4) 


We may rewrite Equation 11.2.3 as 


X 2 - 



This makes the calculations easier to perform by hand or on a calculator. One 
disadvantage of Formula 11.2.4 is the fact that it does not clearly indicate which 
categories contribute most to the overall lack of agreement between observed and 
expected frequencies. 

A more extensive discussion of the mathematical properties of the chi-square 
distribution may be found in most textbooks on mathematical statistics. [See also 
the book by Lancaster (1969), which is devoted entirely to this distribution.] 


11.3 TESTS FOR GOODNESS OF FIT 

Here we want to determine whether sample data are compatible with the hypothesis 
that they were drawn from a population that follows some specified functional 
form—the uniform distribution or the norma! distribution, for example. We can 
reach decisions in this type of case by means of a chi-square goodness-of-fit test. 
To carry out this test, we specify a set of k mutually exclusive categories and note 
the observed frequency of occurrence of sample values in each category. We use 
the properties of the distribution from which we hypothesize that the sample was 
drawn to determine expected frequencies for each category. The magnitude of the 
discrepancy between the observed and expected frequencies forms the basis of the 
test. 

EXAMPLE 11.3.1 The Uniform Distribution. An ad agency asks each member of a 
random sample of 60 viewers to indicate which of six television programs he or 
she prefers. The results are as follows. 


Program 1 2 3 4 5 6 Total 

Number 5 8 10 12 12 13 60 

Can we conclude from these data that the programs are not equally preferred? 

If there is no preference, we would expect the same number of votes for each 
program. In other words, we would expect the total number of votes to be dis¬ 
tributed uniformly among the six programs. Pursuing this line of reasoning, we 
may set up the following hypotheses: 

H 0 : The programs are equally preferred 
//,: The programs are not equally preferred 

Suppose that we let a = 0.05. Under the null hypothesis, the expected number 
of votes for each program is 60/6 = 10. By Equation 13.2.3, we may now 
compute the following value of X 2 : 


X 2 - 


(5 


10 


10) 2 (8 


10 


10) 2 (10 - 10) 2 (12 - 10) 2 


(12 - 10) 2 (13 - 10) 2 


10 


10 


10 

4.6 


10 


The number of degrees of freedom is equal to the number of categories minus 
1, or 6 - 1 =5. From Appendix Table F, the critical value of chi-square for 
a = 0.05 and 5 degrees of freedom is 11.070. Since 4.6 is less than 11.070, we 
cannot reject H 0 . We conclude that the distribution of votes may be uniformly 
distributed (p > 0.10). 

The assumption of a normally distributed population underlies many of the 
hypothesis-testing procedures we have discussed. The chi-square goodness-of-fit 
test provides a means of testing this assumption. 


EXAMPLE 11.3.2 The Normal Distribution. A simple random sample of 700 as¬ 
sembly-line workers took part in an experiment to determine how much time was 
needed for a certain task after they had taken a certain training course. Table 
11.3.1 shows the frequency distribution of the time in seconds that the subjects 
needed to complete the task. We wish to determine whether these data provide 
sufficient evidence to indicate that the observations were not drawn from a normal 
population. 

The appropriate null and alternative hypotheses are as follows: 

H 0 : The sample data come from a normally distributed population 
H\\ The sample data do not come from a normally distributed population 

Assume a = 0.05. 

Equation 11.2.3 provides the test statistic. In this example, k is equal to the 
number of specified class intervals, and the O t are the observed frequencies. The 
E, are yet to be determined. The critical value of the test statistic is the value of 


TABLE 11.3.1 
Frequency 
distribution of time 
(in seconds) 
required to 
complete a task 


Time 

Number of subjects 

0-9.99 

38 

10-19.99 

51 

20-29.99 

62 

30-39.99 

74 

40-49.99 

83 

50-59.99 

91 

60-69.99 

81 

70-79.99 

72 

80-89.99 

61 

90-99.99 

52 

100-109.99 

35 

x = 54.71, s =27.61 

700 


# 




X 2 from Table F corresponding to a = 0.05 and the appropriate degrees of 
freedom. It can be shown that when the null hypothesis is true, X 2 defined by 
Equation 11.2.3 is distributed approximately as x 2 with k — r degrees of freedom. 
The proof is given by Cramer (1946). In determining the degrees of freedom, as 
we have noted, k is the number of categories into which the observed and expected 
frequencies are cast, and r is the number of restrictions, or constraints, imposed 
on the data. We shall explain the nature of these constraints later. 

Before we can compute the value of the test statistic X 2 , we must determine 
the expected frequencies. According to the principles of probability distributions, 
we can find the relative frequency of occurrence of values of a given magnitude 
by finding the area under the curve defined by the distribution. In the case of the 
normal distribution, for example, the relative frequency of occurrence of values 
equal to or greater than some specified value x 0 is equivalent to the area under 
the curve and to the right of x 0 . See Figure 11.3.1. We find the numerical value 
corresponding to this area by converting x 0 to a standard normal deviate by the 
formula z 0 = (jc 0 — fi)/a and finding the area in the standard normal table. 

The present null hypothesis has not specified the mean and variance of the 
hypothesized normal distribution. Using the sample mean of 54.71 and the sample 
standard deviation of 27.61 as estimates of ijl and <x, we can compute the area 
under the normal distribution between the boundaries of each of the specified class 
intervals. 

For example, to compute the area, and hence the relative frequency of occur¬ 
rence of values, corresponding to the interval 10-19.99, we proceed as follows: 
The z corresponding to .v = 10 is equal to (10 - 54.71)/27.61 = - 1.62. The 
z corresponding to x = 20 is equal to (20 — 54.71 )/27.61 = — 1.26. The area 
between - 1.62 and -1.26 is equal to 0.0512. Thus 5.12% of the values are 
expected (under the null hypothesis that the population is normally distributed) to 
fall between 10 and 20. The expected frequency is (0.0512)(700) = 35.84. Using 
this procedure, we may construct columns 3 through 5 of Table 11.3.2. From the 
data in these columns, we compute the entries in column 6 as follows: 


X 2 = 


(38 


36.82 


36.82) 2 (51 - 35.84) 2 


35.84 


+ 


(35 - 35.35) : 
35.35 


= 20.3558 


If the null hypothesis is true, there should be close agreement between the 
observed and expected frequencies. This close agreement leads to a relatively 


FIGURE 11.3.1 
Normal distribution 
showing P(x ^ x 0 ) 



Relative frequency 
of occurrence of 
^/values equal to or 
l greater than x 0 


“small” value of X 2 . If, on the other hand, the null hypothesis is false, there is 
no basis for expecting close agreement between the observed and expected fre¬ 
quencies. Consequently, the computed value of X 2 is likely to be relatively “large.” 
In order to determine whether the computed value of X 2 is “small” or “large,” 
we must have some criterion with which to compare it. This criterion is the 
tabulated value of \ 2 for a ~ 0 05 and the appropriate degrees of freedom. To 
determine the degrees of freedom, consider the number of constraints r that have 
been imposed on the data. As Table 11.3.2 indicates, the expected frequencies 
must add to 700, the sample size. This is one constraint. We have imposed two 
additional constraints on the data because we had to estimate fx and er from the 
sample data. Thus r — 3, and the degrees of freedom are k — r — 11 — 3 = 
8. If /x and <xhad been specified in the null hypothesis, we would not have needed 
to estimate these parameters from the sample data. Thus r would have been equal 
to 1, In general, in goodness-of-fit tests of this type, the number of degrees of 
freedom is equal to the number of categories, less 1 for forcing the sum of the 
expected frequencies to equal the sum of the observed frequencies, less 1 for each 
parameter that has to be estimated from sample data. 

In the present example, the critical value of \ 2 is 15.507. Since the computed 
value of X 2 = 20.3558 is larger than this, we reject the null hypothesis and 
conclude that the sample data did not come from a normally distributed population 
(0.01 > p > 0.005). 


If you estimate parameters from ungrouped sample data rather than from grouped 
data as in our example, the distribution of X 2 may not be sufficiently approximated 
by the chi-square distribution to give good results. [The problem is discussed by 
Dahiya and Gurland (1972) and Watson (1957, 1958, 1959).! We encounter the 
same problem when parameters are estimated independently of the sample. [This 
is discussed by Chase (1972).] 

Note that all the chi-square tests discussed in this chapter are one-sided tests 
with the rejection region in the right tail of the distribution of the test statistic. 


TABLE 11.3.2 
Class intervals, 
expected 
frequencies, 
observed 
frequencies, and 
(O, - Ef) 2 /E, for 
Example 11.3.2 


1 

Class 

interval 

2 

Observed 

frequency 

O, 

3 

z — (x — x)/s 
at lower limit 
of interval 

4 

Area under normal 
curve (expected 
relative frequency) 

5 

Expected 

frequency 

E, 

6 

(0/ - E,) 2 
E; 

Less than 10 

38 

_ 

0.0526 

36.82 

0.0378 

10-19.99 

51 

-1.62 

0.0512 

35.84 

6.4125 

20-29.99 

62 

-1.26 

0.0829 

58.03 

0.2716 

30-39.99 

74 

-0.89 

0.1114 

77.98 

0.2031 

40-49.99 

83 

-0.53 

0.1344 

94.08 

1.3049 

50-59.99 

91 

-0.17 

0.1428 

99.96 

0.8031 

60-69.99 

81 

0.19 

0.1335 

93.45 

1.6587 

70-79.99 

72 

0.55 

0.1124 

78.68 

0.5671 

80-89.99 

61 

0.92 

0.0785 

54.95 

0.6661 

90-99.99 

52 

1.28 

0.0498 

34.86 

8.4274 

100 or more 

35 

1.64 

0.0505 

35.35 

0.0035 


700 


1.0000 

700.00 

20.3558 



Small Expected Frequencies A word of caution is appropriate here regarding the 
use of the chi-square test with small expected frequencies. It is generally agreed 
that approximating the distribution of X 2 by x 2 is not strictly valid when there are 
small expected frequencies. There is no general agreement, however, on what 
constitutes small expected frequencies. Many writers follow the suggestions of 
Cochran (1952, 1954), who states that in goodness-of-fit tests of the type covered 
in this chapter, no expected frequency should be below 1. Some writers require 
minimum expected frequencies of 5. This text will follow Cochran’s rule. 

If some categories have expected frequencies below the minimum, we may 
combine them with adjacent categories to achieve the minimum required fre¬ 
quency. If we do this, we must reduce k accordingly when we determine degrees 
of freedom. [Additional research on small expected frequencies and chi-square 
has been reported by Lewontin and Felsenstein (1965), Roscoe and Byars (1971), 
Slakter (1965, 1966), Tate and Hyer (1969), and Yamold (1970).] 


EXAMPLE 11.3.3 The Binomial Distribution. A market analyst conducts a study on 
the attitudes of grocery shoppers toward savings stamps. The study involves in¬ 
terviewing a random sample of 25 regular shoppers at each of 100 supermarkets. 
The analyst interviews each shopper to determine whether that shopper would 
prefer some form of savings other than stamps. Table 11.3.3 shows the results. 
The analyst sets up the following hypotheses: 

H 0 \ the number of shoppers in samples of size 25 preferring some other 
form of savings is binomially distributed 

H t : the number of shoppers in samples of size 25 preferring some other 
form of savings is not distributed as a binomial 

The significance level a was set at 0.05. 

We can use a chi-square goodness-of-fit test. Since the binomial parameter p 
is not specified, we must estimate it from the sample data, with a resulting loss 
of 1 degree of freedom. We estimate p as follows: 


4(0) + 5(1) + 8(2) + 


+ 16(7) + 10(8) + 6(9) 


= 0.20 


Since the analyst interviewed 25 shoppers at each of 100 stores, a total of 2500 
shoppers were interviewed, as indicated in the denominator of p. The numerator 
of p shows how many of the 2500 shoppers prefer some other form of savings. 

We can find the expected relative frequencies by evaluating the following func¬ 
tion for x = 0, . . ., 25: 


TABLE 11.3.3 
Sample results. 
Example 11.3.3 


Number of shoppers preferring 
some other form of savings 
Number of stores 


10 or more 


8 10 14 15 12 16 10 


Total: 100 


TABLE 11.3.4 

Observed 

frequencies. 

Number preferring 
some other form 
of savings 

Number of 
stores 

O, 

Binomial probability 
(expected relative 
frequency) 

Expected 
number of 
stores, Ej 

CM 

Uj 

1 l *J 

expected relative 

0 

4 )g 

°.° 03 8j 

0,38 )2.74 

14.302 

frequencies, and 

1 

5j 

0.02361 

2.36) 


expected 

2 

8 

0.0708 

7.08 

0.120 

frequencies. 

3 

4 

10 

14 

0.1358 

0.1867 

13.58 

18.67 

0.944 

1.168 

Example 11.3.3 

5 

15 

0.1960 

19.60 

1.080 


6 

12 

0.1633 

16.33 

1.148 


7 

16 

0.1109 

11.09 

2.174 


8 

10 

0.0623 

6.23 

2.281 


9 

6 

0.0295 

2.95 

3.153 


10 or more 

0 

0.0173 

1.73 

1.730 


Total 

100 

1.0000 

100.00 

28.100 


/« = 



(0.2) v (0.8) 25-v 


where x is the number of shoppers at a given store who prefer some other form 
of savings. We may use Appendix Table A for this purpose. Multiplying each of 
these probabilities by 100 gives the expected frequencies. For example, the table 
shows that the probability of observing no shoppers favoring some other form of 
savings when the null hypothesis is true is 0.0038. Multiplying 0.0038 by 100 
gives an expected frequency of 0.38. We can compute the other expected fre¬ 
quencies in a like manner. Columns 3 and 4 of Table 11.3.4 show the results. 

The first expected frequency in the table is less than 1. Therefore, following 
Cochran’s rule for small expected frequencies, we combine the first two categories 
as shown in the table. The computed value of the test statistic, then, is 


X 2 


(9 - 2.74) 2 (8 - 7.08) 2 

2.74 + 7.08 


(0 - 1.73) 2 
1.73 


28.100 


The associated degrees of freedom are 10 (the number of categories after com¬ 
bining), less 1 for forcing the total of the expected frequencies to agree with the 
total of the observed frequencies, less 1 because p had to be estimated from the 
sample data. Thus, the degrees of freedom are 10 - 2 = 8. The critical value 
of x 2 for a = 0.05 and 8 degrees of freedom is 15.507. Since the computed 
value of X 2 is greater than the critical value of x 2 , we reject the null hypothesis. 
We conclude that the data did not come from a binomial distribution (p < 0.005). 


EXAMPLE 11.3.4 The Poisson Distribution. The manager of a resort hotel studies 
the pattern of cancellations over a 90-day period. She observes the results shown 
in Table 11.3.5. Are these data compatible with the hypothesis that the number 
of daily cancellations is Poisson distributed? Let a = 0.05. 



TABLE 11.3.5 
Resort hotel 
cancellation data, 
Example 11.3.4 


Number of cancellations 

Number of days this number 
of cancellations was received 


9 or more 


Total: 90 


Since the Poisson parameter A is not given, we estimate it from the data as 
follows: 

a _ 0(9) + 1(17) + 2(25) + 3(15) + 4(11) + 5(7) + 6(2) + 7(2) + 8(2) 

A — - 

90 

233 

= — = 2.6 
90 


We use the sample mean to estimate A, since, as you will recall from Chapter 
4, A is the mean of the Poisson distribution. You will also recall that the Poisson 
distribution is expressed by 


f(x) = 


2, 3, . . ., 


Using the estimate of A, 2.6, and assuming a Poisson distribution, we can find 
the probabilities for the different values of X in Appendix Table B. Multiplying 
each of these probabilities by 90 gives the expected frequency for each value of 
X. We then compute the value of X 2 in the usual way. Table 11.3.6 shows the 
probabilities and expected frequencies. From these data, we compute 


- 6 . 66) 2 
6.66 


(17 - 17.37) 2 
17.37 


1.53) 2 

,53 


= 6.674 


The degrees of freedom for this example are 6. The number of categories after 
combining to adjust for the small expected frequencies was 8. One degree of 


TABLE 11.3.6 

Observed 

frequencies, 

probabilities, and 

expected 

frequencies. 

Example 11.3.4 


Number of 
cancellations 
0) 

Number of days 
this number was 
observed ( O,) 

(2) 

Probability, assuming 
a Poisson distribution 
with A = 2.6 
(3) 

frequencies, 
col. (3) x 90 
<£/) 

(4) 

(O, - E,) 2 
Ej 
(5) 

0 

9 

0.074 

6.66 

0.822 

1 

17 

0.193 

17.37 

0.008 

2 

25 

0.251 

22.59 

0.257 

3 

15 

0.218 

19.62 

1.088 

4 

11 

0.141 

12.69 

0.225 

5 

7 

0.074 

6.66 

0.017 

6 

2 

0.032 

2.88 

0.269 

7 

2) 

0.012) 

1.08) 


8 

2-4 

0.004 >0.017 

0.36 >1.53 

3.988 

9 or more Oj 

0.001 J 

0.09 J 



Total 


90 


1.000 


90.00 


6.674 



freedom was lost for each of two constraints: (1) forcing the total of the expected 
frequencies to agree with the total of the observed frequencies, and (2) estimating 
A from the sample data. Referring to Table F, we find that we cannot reject the 
null hypothesis that the data came from a Poisson distribution at the a = 0.05 
level of significance. We conclude, therefore, that the data are compatible with 
the hypothesis (p > 0.10). 

[Other goodness-of-fit tests are cited in a bibliography by Daniel (1980).] 


Exercises 



For each of the following exercises, perform the chi-square test at the indicated level of 
significance. Determine the p value. 

11 . 3.1 The following table shows the distribution of a sample of 223 employees by score 
on an aptitude test. The mean and variance computed from the sample data are 74.47 and 
386.4252, respectively. Test the goodness of fit of these data to a normal distribution. Let 
a = 0.05. 


Score 

Number of employees 

< 40.0 

10 

40.0-49.9 

12 

50.0-59.9 

17 

60.0-69.9 

37 

70.0-79.9 

55 

80.0-89.9 

51 

90.0-99.9 

34 

100.0-109.9 

5 

110.0 or greater 

2 

223 



11 . 3.2 The manager of a variety store, during a 300-day observation period, finds that a 
particular item is called for as follows. Test the goodness of fit of these data to a Poisson 
distribution. Let a = 0.05. 


Number of times 

called for 0 1 2 3 4 5 6 7 8 9 10 11 12 13 

Number of days item 
was called for this 

number of times 3 9 21 38 46 54 49 34 20 16 6 3 1 0 Total: 300 



11 . 3.3 A certain clerical operation involves processing forms in batches of 20. A super¬ 
visor examines a sample of 100 batches for errors, with the following results. Test the 
goodness of fit of these data to a binomial distribution with p = 0.20. Let a = 0.05. 


Number of errors 

No. of batches with this no. 
of errors {€>,) 


0 


2 3 4 5 6 7 


8 9 10 11 


1 5 12 15 30 21 8 4 2 


0 Total: 100 



11 . 3.4 A new 25-bed hospital shows the following experience relative to the number of 
beds occupied during the first 134 days of operation. Test the goodness of fit of these data 
to a binomial distribution. Let a = 0.05. 


i 



No. of beds occupied < 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
No. of days this no. 

of beds occupied 0 1 1 2 6 8 17 22 24 21 15 9 5 1 1 1 0 Total: 134 

11.3.5 Interviewers ask each of a sample of 300 shoppers which day he or she prefers to 
grocery-shop. The results are as follows. Do these data indicate that all days are not equally 
preferred by grocery shoppers? Let a = 0.05. 

Mon. Tue. Wed. Thur. Fri. Sat. Sun. Total 

10 20 40 40 80 60 50 300 

11.3.6 A market researcher believes that in a certain population the proportions of persons 
preferring brands A, B, C, and D of toothpaste are 0.30, 0.60, 0.08, and 0.02, respectively. 
A simple random sample of 600 people drawn from the population reveals the following 
preferences. 


Brand A BCD 

No. preferring this brand 192 342 44 22 

Do these data provide sufficient evidence to cast doubt on the researcher’s belief? Let 
a - 0.01. 

11.3.7 Researchers ask each member of a random sample of 360 residents of a community 
to list which local television station he or she prefers to watch for national and international 
news. The results were as follows. 


Station ABC 

No. preferring this station 190 30 140 

Do these data provide sufficient evidence to indicate that the three stations are not equally 
preferred? Let a = 0.10. 


11.4 TESTS OF INDEPENDENCE 

One of the most frequent uses of \ 2 is for testing the null hypothesis that two 
criteria of classification, when applied to a population of subjects (or objects), are 
independent. Two criteria of classification are said to be independent if the dis¬ 
tribution of one criterion in no way depends on the distribution of the other. If 
two criteria of classification are not independent, there is an association between 
the two criteria. Suppose, for example, that two criteria of classification are age 
and type of TV program preferred. Suppose that, for some population of people, 
knowing a person’s age is of no help in predicting the type of program preferred, 
or vice versa. In such a case, age and type of program preferred are independent. 

Typically, we make decisions about the independence of criteria in a population 
on the basis of sample data. We draw a random sample from the population of 
interest, and cross-classify the subjects according to the two criteria. We display 
the cross-classification in a table, called a contingency table. The levels of one 


TABLE 11.4.1 
Two-way 
classification of a 
sample of subjects 


Second 
criterion of 
classification 



First criterion of classification 





Level 



Level 

1 

2 

3 

c 

Total 

1 

On 

o 12 

0 13 

On 

m. 

2 

0 21 

0 22 

o 23 

O 2 C 

n 2 . 

3 

Oai 

O32 

O33 

o 3c 

n 2 . 

r 

Oh 

0,2 

0,3 

0 ,; 

n r . 

Total 

n. i 

n. 2 

n. 3 

n.c 

--- 

n 


criterion provide row headings, and the levels of the other criterion provide the 
column headings. Table 11.4.1 shows a contingency table in which a sample of 
n subjects has been cross-classified according to two criteria. There are r levels 
of the criterion forming the rows and c levels of the criterion forming the columns. 
We place the observed number 0, ; of subjects that may be characterized by one 
level of each criterion in the cell formed by the intersection of the /th row and yth 
column. The cell entries are referred to as observed cell frequencies. 

In order to find the expected cell frequencies needed to calculate an X 2 value, 
we use the principles of probability. For the population represented by the sample 
in Table 11.4.1, suppose that we wish to know the probability that a subject 
picked at random is characterized by level 1 of the first (column) criterion. The 
best estimator of this probability is n A /n. Similarly, the estimator of the proba¬ 
bility that a subject picked at random is characterized by level 1 of the second 
(row) criterion is given by n x /n. We can compute similar estimators for each 
level of each criterion. We call these probabilities marginal probabilities because 
they are computed from marginal totals. Now suppose that we wish to estimate 
the probability that a subject picked at random is characterized by level 1 of both 
criteria. In the absence of any hypothesis about the two criteria of classification, 
we estimate this probability by O n /n. We call this a joint probability . Our ob¬ 
jective, however, is to test the null hypothesis that the two criteria of classification 
are independent. Chapter 3 showed that if two events are independent, the prob¬ 
ability of their joint occurrence is equal to the product of their individual proba¬ 
bilities. Applying this rule to a contingency table, we say the following: 

If two criteria of classification are independent, a joint probability is equal to 
the product of the two corresponding marginal probabilities. 

Under the hypothesis of independence, then, the estimator of the probability that 
a subject picked at random is characterized by level 1 of both criteria is given by 
(«! Jn)(n A /n). We can find the probability for every other cell in the same way. 

To convert any estimated or expected cell probability to an expected cell fre¬ 
quency, we multiply the probability by n. The expected frequency for the first 
cell, which is the expected number of subjects characterized by level 1 of both 
criteria, is 





n 


(11.4.1) 


E 


11 




if the criteria are independent. The n in the numerator will cancel one of the 
denominator n s, so that Equation 11.4.1 reduces to 


E 


n 


(”!.)(” l) 

n 


(11.4.2) 


This leads to the shortcut procedure for computing expected cell frequencies. We 
simply divide the product of corresponding marginal totals by n. 

Once we have computed the expected frequency for each cell, we compute X 2 
by Equation 11.2.3. The analysis then proceeds as explained in Section 11.3. The 
following example illustrates the chi-square test for independence. 


EXAMPLE 11.4.1 A market research firm wishes to know whether they can conclude 
that, for adults in a certain city, the make of car driven is associated with the 
driver’s area of residence. A random sample of 500 adult drivers is interviewed 
to determine what make of car they drive and in what area of the city they live. 
Table 11.4.2 shows the results. 

The appropriate hypotheses are as follows: 

H 0 : Make of car driven and area of residence are independent 
H{. The two criteria of classification are not independent 

Let the level of significance a be 0.05. 

In order to determine the critical value of ^ 2 , we must decide how many degrees 
of freedom are at our disposal. In general, for a contingency table consisting of 
r rows and c columns, the number of degrees of freedom equals (r - 1 )(c - 1). 
We can justify this method of calculating the degrees of freedom intuitively. 
Consider any r X c contingency table with fixed marginal totals. Within any 
column, we have complete freedom in assigning numbers to r — 1 of the cells 
of that column. Once we have filled these r - 1 cells, the frequency of the rth 
cell is automatically determined, since the sum of the cell frequencies must equal 
the fixed total. In other words, we have no freedom in assigning a number to the 
last cell. Similarly, for any row, we are free to assign numbers to only c — 1 of 
the cells in that row. Thus we can assign numbers to only (r — 1) x (c - 1) of 


TABLE 11.4.2 
Five hundred 
drivers classified 
according to make 
of car driven and 
area of residence 


Area of 
residence 


1 

2 

3 


Make of automobile 


A 

B 

C 

Total 

52 

64 

24 

140 

60 

59 

52 

171 

50 

65 

74 

189 

162 

188 

150 

500 


Total 


TABLE 11.4.3 
Observed and 
expected 
frequencies. 
Example 11.4.1 


the rc cells in the table. Hence the degrees of freedom associated with the table 
are (r — l)(c — 1). 

In the present example, r = 3 and c = 3, so that the number of degrees of 
freedom available is (3 - 1)(3 - 1) = 4. The critical value of *f 0>95>4) * s 9.488. 
The rejection region, then, consists of values of X 2 equal to or greater than 9.488. 

Using the data from Table 11.4.2 and the shortcut rule, we compute the ex¬ 
pected cell frequencies as follows: 


E u = 

Ei 1 ~ 

£31 = 

E\2 = 


(162)(140) 

500 

(162X171) 

500 

(162)(189) 

500 

(188)(14 0) 

500 

(188)(171) 

500 


45.36 

E 32 

(188)(189) 

500 

55.40 

En 

(150)(140) 

500 

61.24 

E 23 

(150)071) 

500 

52.64 

E 33 

(150)(189) 

500 


71.06 

42.00 

51.30 

56.70 


64.30 


We can display the observed and expected frequencies together in a table such 
as Table 11.4.3, where the expected frequencies are enclosed in parentheses. From 
the data in the table, we compute the following value of the test statistic: 


2 = y (0„ ~ E„) 2 = (52 - 45.36) 2 (60 - 55.40) 2 

Z Ey 45.36 55.40 


+ • • • + 


(74 - 56.70) 2 
56.70 


19.825 


Since the computed value of X 2 is greater than 9.488, we reject the null hypoth¬ 
esis. We conclude that in the city in which the study was conducted, the area of 
residence and the make of car a person drives are related (p < 0.005). 


Small Expected Frequencies In contingency-table analysis, some cells may yield 
small expected frequencies. This poses a possible threat to the validity of a chi- 


Area of 
residence 


1 

2 

3 


Make of automobile 


A 

B 

C 

Total 

52(45.36) 

64(52.64) 

24(42.00) 

140 

60(55.40) 

59(64.30) 

52(51.30) 

171 

50(61.24) 

65(71.06) 

74(56.70) 

189 

162 

188 

150 

500 


Total 



The 2x2 
Contingency Table 


TABLE 11.4.4 
A2 x 2 

contingency table 


square test. Writers disagree as to how to handle this problem. However, many 
follow Cochran (1954), who says that for tables with more than 1 degree of 
freedom, a minimum of 1 expected frequency per cell is permissible if no more 
than 20% of the cells have expected frequencies of less than 5. We may combine 
adjacent rows and/or columns to satisfy this rule, so long as this does not violate 
the logic of the classification scheme. 


When each criterion of classification has two levels, the resulting contingency 
table has two rows and two columns. Such a table is called a 2 x 2 or four-fold 
contingency table. Table 11.4.4 shows a typical 2x2 contingency table, where 
a, b, c, and d are the observed frequencies in the four cells. We compute the X 2 
value for a 2 x 2 contingency table by the following shortcut formula: 

~ n(ad — be) 2 

X 2 = ---- (11.4.3) 

(a + c)(b + d)(a + b)(c + d) 

This yields the same numerical result as Equation 11.2.3. 

The 2 X 2 contingency table is not immune to the problem of small expected 
frequencies. Again the recommendations of Cochran (1954) are often followed. 
According to Cochran, we should not use the \ 2 test with 2x2 contingency 
tables if n < 20, or if 20 < n < 40 and any expected frequency is less than 5. 
If n is greater than 40, no expected frequency in a 2 x 2 contingency table should 
be less than 1. For samples in which the minimum expected cell frequencies are 
not achieved, a possible alternative to the chi-square test is the Fisher exact test 
[described by Daniel (1978)]. 

EXAMPLE 11.4.2 A market research firm investigating the success of its interview¬ 
ers finds that 176 out of 225 interviews attempted by trained interviewers are 
successfully completed. Of 310 interviews attempted by wwtrained interviewers, 
188 are successfully completed. Table 11.4.5 displays these data as a 2 x 2 
contingency table. 

The firm wishes to know whether these data provide sufficient evidence at the 
0.05 level of significance to indicate a relationship between the training level of 
interviewers and the outcome of attempted interviews. 

By Equation 11.4.3, we may compute 


Second criterion 
of classification 


First criterion of classification 


1 

2 

Total 

1 

a 

b 

a + b 

2 

c 

d 

c + d 

Total 

a + c 

b + d 

n 


TABLE 11.4.5 
Interviews by 
outcome and 

Outcome of 
interview 


Interviewer training 


Trained 

Untrained 

Total 

training of 

Successful 

176 

188 

364 

interviewer 

Unsuccessful 

49 

122 

171 


Total 

225 

310 

535 


2 = 535[(176)(122) - (188)(49)] 2 = 

(225)(310)(364)( 171) 

The computed value of X 2 exceeds the critical value of x 2 in Table F for 1 degree 
of freedom and a = 0.05. We conclude, then, that the two criteria are related 
(p < 0.005). 


Yates' Correction The observed frequencies are discrete in situations in which 
the chi-square test is used, and the chi-square distribution used in the test is 
continuous. We often “correct” for this for a 2 X 2 contingency table. The 
correction procedure, conceived by Yates (1934), is usually called Yates' correc¬ 
tion. This procedure, used only in the 2 x 2 case, involves subtracting 0.5 from 
the absolute value of the difference between observed and expected frequencies 
before squaring. When Yates’ correction is used, Equation 11.4.3 becomes 

2 _ n[\ad — bc\ — 0.5n] 2 
^ (i a + c)(b + d)(a + b)(c + d) 01-4.4) 

Applying the correction procedure to the present example, we have 

2 = 535[|(176)(122) - (188)(49)1 - 0.5(535)] 2 = 

(225)(310)(364)( 171) 

As we might expect with numbers this large, the effect is not dramatic. Although 
the correction procedure has the greatest effect when n is less than 100, in the 
past the procedure has been recommended for general use in the 2 x 2 case. 
Several writers, however, notably Grizzle (1967), Lancaster (1949), Pearson (1947), 
and Plackett (1964), have questioned the use of Yates’ correction. Grizzle, for 
example, has suggested that its use will result in the hypothesis not being rejected 
as often as it should be. [Grizzle’s findings are supported by those of Conover 
(1974) and Camilli and Hopkins (1978).] Based on the findings of these writers, 
some practitioners now recommend that the correction not be used. 


Exercises 



Perform each test at the indicated level of significance. Determine the p value. 

11.4.1 During a market-research survey, a firm obtains information on the education and 
socioeconomic status of 375 heads of households. In the following table the respondents 
are cross-classified by the two criteria. Test at the 0.05 level the null hypothesis that 
socioeconomic status and education are independent. 



Education 


Socioeconomic 

status 

<8 

grades 

8-12 

Non-college, 

post-high-school Some 
training college 

College 

degree 

Total 

1 

10 

7 

3 

4 

1 

25 

2 

14 

10 

7 

4 

2 

37 

3 

9 

25 

13 

18 

3 

68 

4 

7 

9 

38 

44 

6 

104 

5 

3 

8 

14 

18 

62 

105 

6 

2 

3 

8 

10 

13 

36 

Total 

45 

62 

83 

98 

87 

375 

11.4.2 A government agency surveys unemployed persons who are seeking work. It pre- 

pares the following tabulation, by sex and skill level, of 532 interviewees. Do these data 

provide sufficient evidence to indicate that skill status is related to sex? Let a = 0.01. 





Sex 



Skill level 


Male 


Female 


Total 

Skilled 


106 


6 


112 

Semiskilled 


93 


39 


132 

Unskilled 


215 


73 


288 

Total 


414 


118 


532 


11.4.3 A guidance counselor asks a group of 110 junior high school students how much 
time they spend reading books and how much time they spend watching television. The 
students arc then classified as high or low with respect to each activity. The following 
table shows the number of students in each cell when cross-classified. Do these data provide 
sufficient evidence to suggest, at the 0.05 level of significance, that the amounts of book 
reading and television viewing are related? 

Book reading 

Television - 

viewing Low High Total 


Low 

11 

41 

52 

High 

18 

40 

58 

Total 

29 

81 

110 


11.4.4 A sample of 165 defective items produced in two factories operated by the same 
company are classified according to whether the defect is due to poor workmanship or 
inferior material. The data are shown in the following table. Test at the 0.05 level the null 
hypotheses that cause of defect and factory of production are independent. 




Factory 


Cause of defect 

A 

B 

Total 

Poor workmanship 

21 

72 

93 

Inferior material 

46 

26 

72 

Total 

67 

98 

165 


11.4.5 In a transportation survey, 750 people employed in the downtown area of a large 
city are interviewed. They are cross-classified by their type of dwelling unit and their mode 
of travel to and from work. We want to see if there is a dependence between the two 





variables. The following table shows the number of interviewees falling into each cell after 
they are cross-classified. Test H 0 : type of dwelling unit and mode of transportation to work 
are independent. Let a = 0.05. 


Mode of 
transportation 


Type of dwelling unit 


Single¬ 

family 

Two 

family 

Multiple- 

family 

Other 

Total 

Automobile 

148 

140 

102 

97 

487 

Public transportation 

49 

52 

64 

60 

225 

Other 

7 

8 

9 

14 

38 

Total 

204 

200 

175 

171 

750 


11.5 TESTS OF HOMOGENEITY 

We often want to explore the proposition that several populations are homogeneous 
with respect to some characteristic. We may, for example, wish to know whether 
people in several age groups have the same television-viewing habits. Or we may 
want to know whether shoppers from different socioeconomic backgrounds have 
different reasons for buying a certain product. Finally, we may want to know 
whether some raw material available from several vendors is homogeneous in 
quality. 

Another way of stating the problem is to say that we are interested in testing 
the null hypothesis that several populations are homogeneous with respect to the 
proportion of subjects falling into several categories, or levels, of some criterion 
of classification. We may test the hypothesis by means of a chi-square test of 
homogeneity. We draw a random sample from each of the populations of interest. 
Then we find the number in each sample falling into each category. We can display 
the sample data in a contingency table like Table 11.4.1, in which the populations 
are one of the criteria of classification and the characteristic of interest is the other. 
In a contingency table constructed from sample data collected in this manner, 
either the rows or the columns, depending on which we use to indicate the different 
populations, are fixed. The reason is that we determine the sample size before we 
obtain knowledge about the characteristic of interest. The analytical procedure for 
performing a chi-square test of homogeneity is identical to that in Section 11.4 
for performing a chi-square test of independence. 

The difference between the chi-square test of homogeneity and the chi-square 
test of independence comes in the sampling procedure, the rationale underlying 
the calculation of expected frequencies, and the interpretation of results. 

When we are using the chi-square test of independence, the typical sampling 
procedure is to select a single sample from a population, then cross-classify the 
sample entities on the basis of two criteria. When we are using the chi-square test 
of homogeneity, we identify the two or more populations of interest in advance 
and draw a sample from each. We then place the entities of the resulting samples 
in the various categories of the single variable of interest. 



TABLE 11.5.1 
Three samples of 
persons classified 
by type of 
television program 
preferred 
(expected 
frequencies in 
parentheses) 


We saw in Section 11.4 that, for the chi-square test of independence, the 
rationale underlying the calculation of expected frequencies is based on the prob¬ 
ability of the joint occurrence of independent events. The rationale underlying the 
calculation of expected frequencies for the test of homogeneity is based on the 
assumption that if the sampled populations are homogeneous, we can get the best 
estimate of the probability that a member of a given population falls into a given 
category of the variable of interest by pooling the information from the available 
samples. 

Suppose, for example, that the sampled populations are male and female em¬ 
ployees of a certain firm, and the variable of interest is attitude toward manage¬ 
ment. Assume that there are two categories of the variable—satisfied with man¬ 
agement and dissatisfied with it. Suppose that in a sample of 100 men, 30 are 
satisfied with management, whereas in a sample of 110 women, 45 are satisfied. 
If, as hypothesized, men and women are homogeneous with respect to satisfaction, 
we can pool the two samples and consider them a single sample from the same 
population with respect to the variable of interest. We would get the best estimate 
of the true proportion of men and the true proportion of women who are satisfied, 
then, by pooling the sample information. This gives (30 + 45)/(100 + 110) = 
0.3571. Applying this proportion to the sample of 100 men gives an expected 
frequency of (100)(0.3571) = 35.71 satisfied men. Applying the proportion to 
the sample of 110 women yields an expected frequency of (110)(0.3571) = 39.28 
satisfied women. 

Note that the chi-square test of homogeneity is an extension of the procedure 
for testing hypotheses about the difference between two population proportions 
(Chapter 7). When there are two populations involved and the characteristic of 
interest has two categories, the chi-square test of homogeneity is an alternative 
and equivalent way of testing the null hypothesis that two population proportions 
are equal. 

EXAMPLE 11.5.1 A market analyst wants to know whether different age groups 
differ in the types of television programs they prefer. A random sample is selected 
from each of three age groups. Each person is asked to specify which of three 
types of TV program he or she prefers. Table 11.5.1 shows the results, with 
expected frequencies in parentheses. 

We find the expected frequencies by applying the rationale underlying the test 
of homogeneity. If the three sampled populations are homogeneous with respect 


Population 
age group 


Type of program 



A 

B 


C 

Total 

Under 30 

120(70) 

30(67.50) 


50(62.50) 

200 

30-44 

10(35) 

75(33.75) 


15(31.25) 

100 

45 and over 

10(35) 

30(33.75) 


60(31.25) 

100 

Total 

140 

135 

125 

400 


to program preference, the best estimate of the true proportion of subjects prefer¬ 
ring Type A in each age group is given by 140/400 = 0.35. To find the expected 
frequency for Type A preference in each age group, we multiply each sample 
total by 0.35. Thus (200)(0.35) = 70, (100)(0.35) = 35, and (100)(0.35) = 35. 
Similar reasoning yields the other two columns of expected frequencies shown in 
Table 11.5.1. The appropriate hypotheses are: 

H () : The three age groups are homogeneous with respect to type of 
TV program preferred 

: The three age groups are not homogeneous 


Let a = 0.05. From the data in Table 11.5.1, we may compute 


X 2 = 


(120 - 70) 2 (10 


70 


+ 


35 


35) 2 [ (60 


31.25) 2 


31.25 


= 180.495 


From Table F, the critical value of \ 2 for a = 0.05 and 4 degrees of freedom is 
9.488. Since the computed value, 180.495, is larger than 9.488, we reject H 0 and 
conclude that the populations are not homogeneous with respect to the type of TV 
program preferred (p < 0.005). 

The chi-square test of homogeneity applied to a 2 X 2 contingency table is 
equivalent to a test of the equality of two population proportions. To show this, 
let us refer to Example 7.9.1. We can display the data from that example in the 
2x2 contingency table of Table 11.5.2. By Equation 11.4.3, we have 

2 = 400[(54)(123) - (171)(52)] 2 = 

(106)(294)( 175)(225) 

Since 1.65 < 3.841, we cannot reject the null hypothesis that the two populations 
are homogeneous. Recall that in Chapter 7 we computed a 2 value of - 1.28, 
which did not allow us to reject H 0 : p { = p 2 . Note that (— 1.28) 2 = 1.64, which, 
except for rounding error, is equal to our computed value of X 2 . Also note that 
the critical value of x 2 ~ 3.841 is equal to the square of the critical value of 
z = 1.96. These results shouldn’t be surprising in light of the discussion in Section 
11.2. We emphasize that these relationships hold only for a 2 x 2 contingency 
table in which X 2 has 1 degree of freedom. 

We handle the problem of small expected frequencies in the case of the test of 
homogeneity in the way suggested in Section 11.4 for tests of independence. The 
analysis of 2 x 2 contingency tables is also the same for both tests. 


TABLE 11.5.2 
Contingency-table 
display of data 

Background 


Type of store 

Total 

Chain 

Nonchain 

from Example 7.9.1 

Rural 

54 

171 

225 


Urban 

52 

123 

175 


Total 

106 

294 

400 



Exercises 


Perform each test at the indicated level of significance. Determine the p value. 

11.5.1 In a study to evaluate instruction in technical writing, a group of firms submit 1 10 
pieces of technical writing done by members of their staffs who have had training in 
technical writing. The firms also submit 120 pieces of technical writing by staff members 
without training in technical writing. A panel of judges rates each article as superior, 
acceptable, or inferior. The following table shows the number of articles from each group 
falling into each category. Do the data provide sufficient evidence to indicate that the 
samples came from different populations? Let a = 0.05. 

Rating 

Previous ——--- 


training 

Superior 

Acceptable 

Inferior 

Total 

Yes 

48 

39 

23 

110 

No 

12 

36 

72 

120 

Total 

60 

75 

95 

230 



11.5.2 A national firm has a large plant in each of three sections of the country. Six 
months after a change in working conditions and employee benefits is introduced at the 
three factories, 250 employees from each plant arc randomly selected and asked to rate 
the degree of their satisfaction with the new system. The following table shows the results. 
Do these data provide sufficient evidence at the 0.05 level of significance to suggest that 
the employees at the different plants are not homogeneous with respect to satisfaction with 
the new system? 


Degree of satisfaction 


Factory 

Very 

satisfied 

Satisfied 

Dissatisfied 

Very 

dissatisfied 

Total 

1 

135 

70 

25 

20 

250 

2 

145 

80 

15 

10 

250 

3 

140 

75 

20 

15 

250 

Total 

420 

225 

60 

45 

750 


Summary In this chapter you used the chi-square distribution to analyze frequency data. The 

chapter presented three types of test: tests of goodness of fit, tests of independence, 
and tests of homogeneity. It also presented methods of handling small expected 
frequencies. 

You learned that the test of goodness of fit is useful for testing the null hy¬ 
pothesis that data available for analysis were drawn from a population that follows 
some specified distribution, such as the binomial, the Poisson, the normal, or the 
uniform. Sufficiently poor agreement between observed frequencies and the fre¬ 
quencies we would expect if the null hypothesis is true is taken as an indication 
that the null hypothesis is false. 

The tests of independence and homogeneity are alike in that the arithmetic 
involved in calculating expected frequencies and the test statistic is the same. The 
two tests differ, however, with respect to the hypothesis they test, the rationale 


underlying the computation of relative frequencies, the way in which the data are 
typically collected, and the interpretation of the results. 

The test of independence is concerned with the relationship between two vari¬ 
ables in a single population, whereas the test of homogeneity is concerned with 
whether or not categories of a single variable are represented in the same propor¬ 
tions in two or more populations. 

The rationale underlying the calculation of expected frequencies for the test of 
independence is based on a probability law that states that if two events are 
independent, the probability of their joint occurrence is equal to the product of 
the probabilities of the individual occurrences of the two events. The rationale 
underlying the calculation of expected frequencies for the test of homogeneity is 
as follows: If, as stated in the null hypothesis, the populations are homogeneous 
with respect to some variable, we can obtain the best estimates of the probabilities 
that sample entities fall into the various categories of the variable by pooling the 
sample data. 

When we anticipate a test of independence, we select a random sample from a 
single population, then cross-classify the sample entities according to the two 
variables of interest. For the test of homogeneity, we identify two or more pop¬ 
ulations and draw a sample from each. We then classify sample entities in each 
sample on the basis of the single variable of interest. 

We interpret the results of tests of independence and homogeneity, of course, 
in terms of independence and homogeneity, respectively. 


Review Questions 



1. How can a chi-square distribution be derived from a normal distribution? 

2. What arc the mean and variance of a chi-square distribution? 

3. How do you compute degrees of freedom for the chi-square goodness-of-fit test? 

4. State Cochran’s rule for handling small expected frequencies in (a) goodness-of-fit 
tests, (b) contingency tables in general, (c) 2x2 contingency tables. 

5. What is a contingency table? 

6. How arc degrees of freedom computed for contingency tables? 

7. Explain the rationale behind the method of computing the expected frequencies in a 
test of independence. 

8. Explain the difference between a test of independence and a test of homogeneity. 

9. Over a period of three months, a firm observes the following distribution of employee 
terminations. Test the goodness of fit of these data to a Poisson distribution using the \ 2 
test. Let a = 0.05. 


Number of terminations 0 1 2 3 4 5 6 

No. of days this no. of terminations was observed 7 17 28 20 10 6 2 

10. The following is a frequency distribution of the difference between the yearly high 
and low- stock prices for a sample of 115 stocks for a recent year. Test the null hypothesis 


that these data come from a population of normally distributed values. Let a = 0.05. 
Determine the p value. 


Difference 


Number of stocks 


0-4.999 

3 

5.000-9.999 

27 

10.000-14.999 

35 

15.000-19.999 

25 

20.000-24.999 

8 

25.000-29.999 

10 

30.000-34.999 

4 

35.000-39.999 

1 

40.000-44.999 

2 


115 



11. In a certain firm the health records of 280 employees who were ill during a year show 
the following distribution by sex and severity of illness. Do these data provide sufficient 
evidence at the 0.05 level of significance to indicate that sex and severity of illness are 
related? Determine the p value. 


Sex 


Severity of 
illness 

Male 

Female 

Total 

Nondisabling 

90 

90 

180 

Disabling but mobile 

36 

25 

61 

Confined to bed 

24 

15 

39 

Total 

150 

130 

280 



12. In a study of the price-quality relationship of certain household products, a market 
research firm had a group of homemakers test 135 products and rate them as poor, medi¬ 
ocre, or superior. The following table shows the 135 products cross-classified by the 
homemakers’ ratings and the price category. Test at the 0.05 level the null hypothesis that 
price and quality are unrelated. Determine the p value. 


Price category 


Rating 

Low 

Medium 

High 

Total 

Poor 

15 

8 

7 

30 

Mediocre 

10 

40 

14 

64 

Superior 

5 

12 

24 

41 

Total 

30 

60 

45 

135 


13. A researcher selects a random sample from the regular customers of each of three 
shopping centers. The researcher then determines the distance of residence from the shop¬ 
ping center. The following table shows the distribution by distance of residence for the 
500 customers of the three shopping centers. Do these data provide sufficient evidence to 
suggest that these shopping centers are not homogeneous with respect to distances of 
residences of their regular shoppers? Let a = 0.05. Determine the p value. 



Distance of 
residence 



Shopping center 


A 

B 

C 

Total 

0-5.0 

110 

80 

87 

277 

5.1-20.0 

40 

55 

57 

152 

20.1 and over 

20 

25 

26 

71 

Total 

170 

160 

170 

500 


14. In a study of outdoor recreation, researchers select a random sample from among the 
adult males working in a metropolitan area. Subjects are classified on the basis of their 
occupation and the extent to which they pursue outdoor activities such as hunting, fishing, 
camping, and outdoor sports. The results are as given in the following table. Do these 
data provide sufficient evidence to indicate a lack of independence between occupation 
and extent of participation in outdoor activities? Let a = 0.05, and determine the p value. 


Extent of participation 


Occupation 

Seldom 
or never 

Occasionally 

Frequently 

Very 

frequently 

Total 

Professional 

10 

10 

20 

60 

100 

Executive 

10 

10 

25 

65 

110 

Other white collar 

25 

30 

30 

65 

150 

Skilled labor 

25 

20 

35 

40 

120 

Unskilled labor 

45 

40 

25 

20 

130 

Total 

115 

110 

135 

250 

610 


15. Market researchers select samples of adults from each of two communities. Respond¬ 
ents are asked to indicate the extent to which they are satisfied with the shopping facilities 
available to residents of their community. The results are as given in the following table. 
Do these data provide sufficient evidence to indicate a lack of homogeneity between the 
two communities with regard to extent of residents’ satisfaction with available shopping 
facilities? Let a = 0.05, and determine the p value. 


Community 

Degree of “ " —~ 

satisfaction A B 


Very satisfied 

40 

60 

Satisfied 

70 

90 

Dissatisfied 

60 

30 

Very dissatisfied 

30 

20 

Total 

200 

200 


16. In a taste test to study the acceptability of a coffee substitute, each member of a panel 
of 25 coffee drinkers is given coffee (C) and the coffee substitute (S) in six different taste 
tests. The panelists indicate which of the two they prefer. They are not told which is coffee 
and which is the coffee substitute. The order in which the two beverages arc presented to 
each panelist is randomized at each test. The results arc as given in the following table. 
Test the null hypothesis that the number of times coffee is chosen follows a binomial 
distribution. Let a = 0.05, and determine the p value. 




Taste test 

Number of times 

Panelist 1 2 3 4 5 6 C preferred 


1 

C 

c 

s 

s 

s 

s 

2 

2 

S 

s 

c 

c 

c 

s 

3 

3 

C 

c 

c 

s 

c 

c 

5 

4 

s 

s 

c 

s 

c 

s 

2 

5 

s 

c 

s 

s 

c 

s 

2 

6 

s 

c 

s 

s 

s 

s 

1 

7 

s 

s 

s 

s 

s 

c 

1 

8 

s 

s 

s 

c 

s 

c 

2 

9 

s 

s 

s 

s 

s 

s 

0 

10 

s 

s 

c 

s 

s 

c 

2 

11 

s 

s 

s 

s 

c 

c 

2 

12 

c 

s 

s 

s 

s 

s 

1 

13 

c 

s 

s 

c 

s 

c 

3 

14 

s 

s 

s 

c 

c 

s 

2 

15 

s 

c 

c 

c 

c 

s 

4 

16 

c 

c 

s 

s 

s 

c 

3 

17 

s 

c 

s 

s 

s 

s 

1 

18 

c 

c 

s 

s 

c 

c 

4 

19 

c 

s 

s 

s 

s 

s 

1 

20 

c 

s 

c 

s 

s 

s 

2 

21 

c 

c 

s 

s 

s 

c 

3 

22 

s 

s 

s 

s 

c 

s 

1 

23 

c 

c 

s 

s 

c 

c 

4 

24 

s 

s 

s 

c 

s 

s 

1 

25 

c 

s 

s 

c 

s 

s 

2 

17. In a study of public transportation 

i in a certain city, researchers hypothesized that the 

distribution of the number of empty seats on 

buses arriving at 

a certain point during rush 

hour follows a 

Poisson distribution with parameter 2. 

A check of a random selection of 

30 of these buses yields the following results. 

Do the observed data support the hypothesis? 


Let a = 0.05, and determine the p value. 


Number of empty seats 0 1 234567 89 or more 

Observed frequency 1 6 5 6 5 3 2 1 1 0 

18. An economist selects a sample of 200 executives from each of five industries. A 

questionnaire mailed to each executive asks: “Do you believe the rate of inflation will be 

higher during the coming year than last year?” The following table shows the results. Can 
we conclude that there is a lack of homogeneity among industries with regard to the 
executives’ opinions on inflation? Let a = 0.05. 

Industry 

Response to - 

question A B C D E 


Yes 

150 

100 

75 

170 

90 

No 

50 

100 

125 

30 

110 


200 

200 

200 

200 

200 


19. The following table shows the number of people boarding an elevator during 10- 
minute intervals. Can we conclude from these data that the sample is from a Poisson 
distribution? Let a = 0.05, and determine the p value. 



0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


Number of 
persons (x) 

No. of periods 

with x persons 5 10 15 18 20 25 33 28 24 20 17 12 8 4 2 1 Total: 242 




20. A firm has found that about 85% of its accounts receivable are paid on time, 8% are 
paid after becoming 1 through 30 days delinquent, 6% are paid within the 31 - through 60- 
day delinquency period, and 1% are paid after becoming 60 days delinquent or are not 
paid at all. A random sample of 500 of this year’s accounts reveals the following payment 
status: Paid on time, 405; 1-30 days delinquent, 50; 31-60 days delinquent, 40; over 60 
days delinquent, 5. Docs it appear that current accounts receivable conform to the prior 
payment experience? Let a = 0.05, and determine the p value. 

21. In a certain area the population consists of four ethnic groups in the following pro¬ 
portions: 

Ethnic group A B C D 

Proportion of population 0.13 0.05 0.08 0.74 


A survey of 1000 persons in the area employed in managerial and professional positions 
gives the following results: 

Ethnic group A B C D 

Number 145 45 75 735 Total: 1000 



Do these data indicate that the ethnic-group composition of managers and professionals in 
the area is different from the ethnic-group composition of the area’s population? Let a ~ 
0.05, and determine the p value. 

22. Market researchers select a random sample from each of two populations of recent 
purchasers of new cars. Subjects in Population A have recently bought a Make A car, and 
those in Population B have recently bought a Make B car. Subjects are asked to indicate 
the degree to which they are satisfied or dissatisfied with their new car. The results are as 
given in the following table. Do these data provide sufficient evidence to indicate a lack 
of homogeneity between the two groups with respect to degree of satisfaction or dissat¬ 
isfaction? Let a = 0.05, and determine the p value. 


Make of automobile purchased 


Degree of 


satisfaction/dissatisfaction 

A 

B 

Extremely satisfied 

62 

28 

Mildly satisfied 

24 

18 

Neutral 

20 

28 

Mildly dissatisfied 

8 

10 

Extremely dissatisfied 

6 

16 


120 

100 


23. A team of industrial psychologists selects a random sample from among the middle- 
and lower-echelon white-collar workers in a certain industry. Subjects are asked to indicate 
the strength of their expectations of achieving a top-level position in their firms. The results 
by age of respondent are shown in the following table. Do these data provide sufficient 
evidence to indicate a relationship between age and strength of expectation of achieving a 
top-level position? Let a = 0.05, and determine the p value. 






Strength of Expectation 


Age 

Low 

High 

Uncertain 

Total 

<25 

10 

95 

20 

125 

25-34 

20 

110 

20 

150 

35-44 

20 

60 

20 

100 

44 or more 

50 

40 

35 

125 

Total 

100 

305 

95 

500 

24. A sociologist selects a sample of 200 ads from the media available in a city. The ads 
are then categorized on the basis of sex orientation and content. The results are as follows. 
Do these data provide sufficient evidence to indicate a lack of independence between the 

two variables? Let a = 0.05. 






Sex orientation 


Content 

Masculine 

Feminine 

Neutral 

Total 

Adventure 

60 

10 

5 

75 

Domestic 

5 

50 

10 

65 

Nostalgic 

15 

10 

35 

60 

Total 

80 

70 

50 

200 

25. Each of a sample of 400 adults residing 

in the home city of 

a professional football 

team is asked, “ 

‘Do you think this city should continue to be the home of a professional 

football team?” 

The responses by age of respondent are as shown 

in the following table. 

Do these data suggest a lack of independence between age and attitude toward the profes- 

sional football team? Let a = 

0.01, and determine the p value. 


Age of 


Response 




No 

Total 


Respondent 

Under 30 

50 

75 

125 


30-49 

130 

45 

175 


50 and older 

60 

40 

100 


Total 

240 

160 

400 


26. Market researchers conduct a study of shoppers at the four leading department stores 
of a city. They select independent random samples of shoppers who shop primarily at each 
store. The following table shows the age distribution of shoppers. Do these data provide 

sufficient evidence to indicate a 

lack of homogeneity with respect to age among the sampled 

populations? Let 

ot = 0.05, and determine the/? value. 





Store 


Age group 

A 

B 

C D 

Total 

<25 

80 

40 

90 50 

260 

25-35 

60 

60 

80 40 

240 

36-45 

30 

50 

45 75 

200 

Over 45 

30 

50 

35 85 

200 

Total 

200 

200 

250 250 

900 



27. An ecologist investigates the extent to which officials of industrial firms in a certain 
area are concerned with control of environmental pollution. One of the questions asked 
is, “Does your firm have a policy regarding the implementation and/or maintenance of 
environmental pollution-control measures?” The following table shows responses by major 
industry type. Do these data suggest a lack of independence between type of industry and 
the presence or absence of a policy on pollution-control measures? Let a = 0.01, and 
find the p value. 


Response 


Industry type 

Yes 

No 

Total 

A 

5 

35 

40 

B 

15 

15 

30 

C 

10 

15 

25 

D 

15 

45 

60 

E 

35 

15 

50 

Total 

80 

125 

205 


28. Select a simple random sample of size 100 from the population of employed heads of 
households in Appendix II. Can you conclude from the information in your sample that 
sex and occupation are independent? Let a = 0.05, and find the p value. 

29. During a market-research survey, a firm obtains information on the social class and 
major leisure-time activity of 375 heads of household. In the following table, the respond¬ 
ents are cross-classified by the two criteria. Test at the 0.01 level the null hypothesis that 
social class and major leisure-time activity are independent. What use can be made of the 
results of this study? 


Social class 


Leisure-time 


activity 

i 

2 

3 

4 

5 

Total 

A 

10 

7 

3 

4 

1 

25 

B 

14 

10 

7 

4 

2 

37 

C 

9 

25 

13 

18 

3 

68 

D 

7 

9 

38 

44 

6 

104 

E 

3 

8 

14 

18 

62 

105 

F 

_2 

_3 

_8 

JO 

T3 

36 

Total 

45 

62 

83 

98 

87 

375 


30. A random sample of 150 shoppers was allowed to choose one of three brands of facial 
tissue. The results were as follows: Brand A was chosen 35 times; brand B, 55 times; and 
brand C, 60 times. Do these data provide sufficient evidence to indicate that the three 
brands are not equally preferred? Let a = 0.05. 

31. A random sample of 125 absentee reports filed during a year by employees of a large 
firm yielded the following information: 


Day of absence: M T W Th F 

Number of absences: 35 20 15 15 40 

Do these data provide sufficient evidence to indicate, at the 0.01 level of significance, that 
employee absenteeism tends to be higher on some days than on others? What use might 
management make of the results of this study? 



Shoplifting 


Retailers in the United States annually lose goods worth more than $5 billion 
because of shoplifting, according to Bellur.* He cites other evidence to indicate 
the mammoth proportions of shoplifting in the United States. With these sta¬ 
tistics as background, he designed a questionnaire to reveal more facts about 
shoplifting, shoplifters, and their victims. 

A sample of 106 midwestern retailers completed the questionnaire. The fol¬ 
lowing seven types of stores were represented in the sample: department, 
grocery, clothing, card and gift, drug, variety and discount, and specialty. 

One question asked of retailers was “Do you have a shoplifting problem?' 7 
(Yes or No). In testing the null hypothesis that the type of store is independent 
of the presence of a shoplifting problem, Bellur obtained an X 2 value of 32.60. 
What are the degrees of freedom for the test? Should the null hypothesis be 
rejected at the 0.05 level? Why? What is the p value for the test? What con¬ 
clusion can one draw from these results? What assumptions are necessary? 

*Venkatakrishna V. Bellur, "Shoplifting: Can It Be Prevented?" Journal of the Academy of Marketing 
Science, 9 (1981), 78-87. 


Popular Music Artists 


The recording artist is one of the least examined aspects of the music industry, 
according to Denisoff and Bridges.* They conducted a study that examined the 
“who-what-where" of performers and their music. They analyzed biographical 
data on 667 artists, classified as to musical style: (1) rock, (2) soul/rhythm and 
blues/disco, (3) “easy listening," (4) country and Western, (5) jazz, (6) classical, 
and (7) “other," which included comedy artists and purely electronic artists. 
They also classified the musicians in terms of such variables as education, stat¬ 
ure (minor, marginal, or major artist), race, geographic region of origin, sex, 
and age. As part of their statistical analysis, Denisoff and Bridges cross-classified 
580 artists on the basis of sex and musical style and computed a chi-square 
statistic of 41.13. What null hypothesis can one test with these data? Should 
the null hypothesis be rejected? Why or why not? Let a = 0.05. Compute the 
p value for the test. What assumptions are required? 

*R. Serge Denisoff and John Bridges, "Popular Music: Who Are the Recording Artists?" Journal of Com¬ 
munication, 32 (Winter 1982), 132-142. 






Hazards Associated with Alternative Heat Sources 


In response to rising fuel prices, many homeowners, landlords, and businesses 
have invested in improved insulation and alternative sources of heat. In fact, 
the presence of fuel-burning space heaters—especially those that burn wood— 
has become a status symbol for some homeowners. 

These efforts by conservation-minded people, however, are not without po¬ 
tential hazards. Some authorities fear that improved insulation seals in stale 
(and possibly polluted) air along with warmth. Studies have also shown that 
fuel-burning space heaters are a major source of carbon monoxide. 

In a study of the problem, Lao and others* investigated 27 homes with fuel¬ 
burning space heaters. To see whether they could conclude that there is an 
association between type of fuel burned in the space heater and levels of 
carbon monoxide, they cross-classified the 27 homes on the basis of these two 
variables. They set up two categories of heat source (Wood and Other) and 
two categories of level of carbon monoxide (High and Low). Of the 10 homes 
using wood-burning heaters, 7 had high levels of carbon monoxide. Five of the 
homes that burned other types of fuel had high levels of carbon monoxide. 
What can one conclude from these results? State appropriate null and alter¬ 
native hypotheses. Compute an appropriate test statistic. Let a = 0.05 and find 
the p value for the test. What assumptions are necessary? 

*Y. J. Lao, Ronald W. Smith, Terry L. Rich, and Trenton G. Davis, "Carbon Monoxide Levels in Homes with 
Fuel-Burning Space Heaters," Journal of Environmental Health, 44 (January-February 1982), 180-182. In 
this study, a fuel-burning space heater was defined as any heater that burns wood, oil, or gas for the 
purpose of heating living space. 




12. Nonparametric Statistics 


Chapter Objectives: This chapter introduces you to 
some statistical techniques that are often character¬ 
ized as "quick and dirty" because you can usually do 
the calculations quickly, and the assumptions underly¬ 
ing their use are not as stringent as those underlying 
most of the other procedures discussed in this text. 
After studying this chapter and working the exercises, 
you should be able to do the following. 

1. Name and describe the four measurement scales 

2. List the advantages and disadvantages of nonpara¬ 
metric statistical procedures 

3. Determine when nonparametric statistical proce¬ 
dures are appropriate 

4. Perform the analysis for each of the following sta¬ 
tistical procedures: (a) the one-sample runs test, (b) 
the Wilcoxon test, (c) the Mann-Whitney test, (d) 
the sign test, (e) the Kruskal-Wallis one-way analy¬ 
sis of variance, (f) the Friedman two-way analysis 
of variance, (g) the Spearman rank correlation, and 
(h) nonparametric regression analysis. 



12.1 INTRODUCTION 


Most of the inferential procedures we have discussed had two characteristics in 
common. First, they were concerned with population parameters . Second, their 
validity depended on a set of rigid assumptions. (There was one inferential pro¬ 
cedure to which this statement does not apply. We’ll discuss it shortly.) The 
objectives in estimation and hypothesis testing were to estimate and test hypotheses 
about such parameters as a population mean, a population proportion, and a pop¬ 
ulation variance. For an example of a set of assumptions underlying an inferential 
procedure, consider the t test for testing the null hypothesis that two population 
means are equal. The use of this test rests on the assumption that the populations 
of interest are normally distributed with equal variances. 

The one inferential procedure covered earlier that was not concerned with pop¬ 
ulation parameters and that did not rest on a rigid set of assumptions was the use 
of the chi-square distribution in tests of goodness of fit and independence. 

We refer to inferential procedures such as the t test and analysis-of-variance 
methods as parametric procedures, because they are concerned with population 
parameters. The body of statistical theory and methodology of which they are a 
part is called parametric statistics. Inferential procedures such as the chi-square 
tests of goodness of fit and independence, which are not concerned with population 
parameters or do not depend on rigid assumptions about the distribution of the 
relevant population, are called nonparametric procedures. The statistical theory 
and methodology relating to these procedures are called nonparametric statistics. 

Some writers—Kendall and Sundrum (1953), for example—distinguish between 
procedures that are not concerned with parameters and procedures that do not 
depend on the distribution of the parent population. They refer to only the former 
as nonparametric statistics, and refer to the latter as distribution-free statistics. At 
an elementary level of discussion, the two are usually treated under the single 
heading of nonparametric statistics. 

Here we shall discuss only a few of the more widely used nonparametric pro¬ 
cedures. You can find a deeper treatment of the subject in the many textbooks on 
nonparametric statistics. [These include the books by Bradley (1968), Conover 
(1980), Daniel (1978), Gibbons (1976), Kraft and Van Eeden (1968), Marascuilo 
and McSweeney (1977), Mosteller and Rourke (1973), Noether (1976), Pierce 
(1970), Siegel (1956), and Tate and Clelland (1957). Texts by the following 
authors have a more advanced mathematical orientation: Fraser (1957), Gibbons 
(1971), Hajek (1969), Noether (1967), Puri (1970), Puri and Sen (1971), and 
Walsh (1962, 1965). See also the books by Edgington (1969) and Senders (1958).] 

12.2 WHEN TO USE NONPARAMETRIC STATISTICS 

Nonparametric statistics are most often used in four situations, as follows. 

1. The data do not meet the assumptions for a parametric test. For example, the 
nature of the hypothesis to be tested may suggest the use of the t test. But this 







test may not be appropriate because the sample on which the test is to be based 
was drawn from a population known to be substantially nonnormally distributed. 
In such a case, we may use an alternative nonparametric test that does not depend 
on the assumption of a normally distributed parent population. 

2. The data consist merely of ranks. For example, consumers may be asked to 
indicate how much they like several brands of coffee. They may not be able to 
assign each brand a numerical score representing how much they like it, but they 
may be able to rank the brands in order of preference. The analysis, then, is based 
on ranks. Data consisting of ranks are said to be measured on a weak measurement 
scale. Most parametric tests require a stronger measurement scale, but generally 
the nonparametric tests do not. Section 12.3 gives a more complete discussion of 
measurement scales. 

3. The question to he answered does not involve a parameter. For example, if 
we wish to reach a decision about whether a sample is a random sample, we use 
the appropriate nonparametric test. 

4. We need results quickly. As a rule, the calculations required by the nonpara¬ 
metric procedures are more quickly and easily carried out than those required by 
parametric ones. The calculations for some of the nonparametric procedures are 
so easy that they can be done right at the data-collection site—in a manufacturing 
plant, for example—by someone relatively unskilled in statistics or mathematics. 


12.3 MEASUREMENT AND MEASUREMENT SCALES 

Stevens (1946) defines measurement as “the assignment of numerals to objects 
or events according to rules.” Here numeral is used to mean any symbol, not 
necessarily a number, that we may assign to an object or event. The use of different 
rules for the assignment of numerals leads to different types of measurement, 
which, in turn, lead to different measurement scales. There are four measurement 
scales: nominal, ordinal, interval, and ratio. We discuss each one briefly here. 
[You can find a more complete discussion in Stevens (1946, 1951), Senders 
(1958), and Siegel (1956).] 

The Nominal Scale The weakest of the four measurement scales is the nominal 
scale. We use this when one object or event is distinguished from another by 
names. For example, the names desk, chair, and file cabinet are assigned to objects 
in an office to distinguish one from the other. Another example of nominal 
measurement occurs when numerals are assigned to football players for identifi¬ 
cation purposes. When objects or events are measured on a nominal scale, we 
can say that one is different from the other. 

The Ordinal Scale When one object not only differs from another with respect 
to some characteristic, but also one has more or less of the characteristic than 
another, we have at least an ordinal scale. There are many instances in business 



in which we use the ordinal measurement scale. We may categorize accounts 
receivable as large, medium, and small. We may label typists as fast, average, 
and slow. We may rate different sources of raw material for the manufacture of 
some product as good, better, and best. We may rank salespersons according to 
the strength of their personalities. When we rank items, it is possible to tell which 
ones have more or less of the characteristic on which the ranking is based (except 
in the case of ties). But it is not possible to tell how much more or less of the 
characteristic one object has than another. 

The Interval Scale When we can say not only that one object is greater or less 
than another, but also that one is greater or less by a specified amount, we have 
achieved measurement on at least an interval scale. A characteristic of the interval 
scale is the presence of a unit of measurement. Temperature as measured on a 
Fahrenheit thermometer is an example of measurement on an interval scale. We 
can say that it is warmer today than yesterday. We can also say that it is five 
degrees warmer. 

The Ratio Scale The ratio scale is characterized by the use of both a unit of 
measurement and a true zero point. The fact that a thermometer shows the tem¬ 
perature to be zero degrees does not mean that there is a complete absence of 
temperature. On the other hand, we measure weight on a ratio scale. When a 
weighing device shows 0, this indicates an absence of weight. Measurement on 
a ratio scale allows us to say that one object is so many times as great as (or less 
than) another, and that it is so many units more than (or less than) another. For 
example, suppose that a man weighs 180 pounds and his child weighs 60 pounds. 
We can say (1) the father weighs three times as much as the child, and (2) the 
father weighs 120 pounds more than the child. The ratio scale is the “strongest” 
of the four measurement scales. [Some authors, for example Conover (1980), 
Siegel (1956), and Senders (1958), indicate that different tests require different 
measurement scales. Although this idea appears to be generally followed in prac¬ 
tice, Anderson (1961) and Gaito (1959) present some interesting alternative points 
of view. ] 


12.4 ADVANTAGES AND DISADVANTAGES OF 
NONPARAMETRIC STATISTICS 

The advantages and disadvantages of nonparametric statistical procedures have 
been listed and discussed by Gibbons (1971), Moses (1952), Mosteller and Bush 
(1954), and Siegel (1956). We may summarize them as follows. 


Advantages 


1. The probability statements accompanying the statistical tests are usually exact. 

2. The calculations are usually easily and rapidly performed. 

3. The assumptions are usually few and easily met. 






Disadvantages 


4. As a consequence of number 3, nonparametric procedures are widely appli¬ 
cable. 

5. We can analyze data measured on a weak measurement scale by nonparametric 
techniques. 

1. Nonparametric procedures, because of their simplicity and ease of computation 
with small samples, may be applied in cases in which parametric procedures would 
be more appropriate. Such a practice is inefficient and should be avoided. 

2. For large samples, the calculations may become burdensome unless one uses 
approximations. 

Additional disadvantages are listed by Gaito (1959), who also takes exception 
to some of the advantages listed by other authors. 


12.5 THE ONE-SAMPLE RUNS TEST 

We have shown the importance of random samples in statistical inference. It is 
valuable to have a procedure for testing the null hypothesis that a sample is indeed 
a random sample. This section presents a test based on the number of runs present 
in the sample. A run is defined as a sequence of like symbols preceded and 
followed by either a different symbol or no symbol at all. Two types of symbols 
are used. One or the other is assigned to each observation in the sample, depending 
on whether the observation is greater than or less than the sample median or mean. 
If we display the symbols in the order in which the numerical values they represent 
were selected, we can easily recognize the number of runs in the data. Suppose, 
for example, that we observe the following sequence of symbols in a sample of 
size 10, where B stands for a value below the median and A represents a value 
above the median: 


BBBBB AAAAA 

In this sequence there are only two runs. This seems to be too few to support a 
hypothesis of randomness. By contrast, consider the following sequence: 

BABABABABA 

This sequence contains 10 runs. Again we suspect a lack of randomness. Intui¬ 
tively it seems that some systematic, rather than random, procedure is operative. 
In a particular sequence, then, we suspect a lack of randomness if there appear 
to be either too few or too many runs. 

Finally, suppose that we observe the following sequence, which contains six 
runs: 


B ABB ABB AAA 

This sequence seems to be rather well mixed. There does not appear to be any 
reason to suspect a lack of randomness. 


To determine whether an observed number of runs is small enough or large 
enough to cause rejection of the null hypothesis that the sample is a random one, 
we can consult the tables by Eisenhart and Swed (1943). Appendix Table J gives 
a portion of these tables. To use Table J, we designate the number of symbols of 
one kind as n x and the number of symbols of the other kind as n 2 . (The total 
sample size is n x + n 2 = n .) Table J gives critical values of r, the number of 
runs, for values of n x and n 2 through 20. If the observed number of runs is less 
than or equal to the appropriate critical value of r in Table Ja or greater than or 
equal to the appropriate critical value of r in Table Jb, we can reject the null 
hypothesis of randomness at the 0.05 level of significance. 

In the first sequence given above, we have n x — number of 5’s = 5, n 2 = 
number of A’s = 5, and r — 2. Table Ja indicates that we can reject the hypothesis 
of randomness in that sequence at the 0.05 level of significance, since the critical 
value is 2. For the second sequence, where n x = 5, n 2 = 5, and r = 10, Table 
Jb shows that we can reject the hypothesis of randomness at the 0.05 significance 
level, since here the critical value is 10. The last sequence, where n x = 5, n 2 = 
5, and r — 6, may be in random order, since, by Tables Ja and Jb, r = 6 is not 
significant. 

The use of the runs test is not limited to testing the null hypothesis that a sample 
is random. We can use it to test any sequence for randomness, no matter how the 
sequence is generated. Sequences that we may test for randomness include, for 
example, the arrangement of males and females in a cafeteria line, the presence 
and absence of rain over a period of several days, the wins and losses of an 
athletic team, or the sequence of correct answers to a true-false quiz. 

EXAMPLE 12.5.1 The quality-control department of a bubble-bath company requires 
the mean weight of packages of its product to be 17 ounces. A sample of 20 
consecutive packages filled by the same machine is taken from the assembly line 
and weighed, with the following results (in ounces): 17.9, 17.5, 17.2, 17.3, 16.5, 

16.8, 16.7, 17.2, 17.4, 17.6, 17.5, 17.8, 16.8, 16.5, 16.6, 17.7, 17.6, 17.7, 

17.8, 17.2. Do these data provide sufficient evidence to indicate a lack of ran¬ 
domness in the pattern of over and under fills? 

From the results, we see that n x = the number of weights less than the mean 
= 6, n 2 = 14, and r = 5. Reference to Table J indicates that we can reject the 
null hypothesis of randomness at the 0.05 level of significance. 

When either n x or n 2 is larger than 20, we cannot use Table J to test for 
significance. For samples too large to use Table J, we can compute the following 
test statistic: 


2 n x n 2 
n i + n 2 


+ 1 


2n x n 2 (2n x n 2 — n x — n 2 ) 


(n, + 


n 2 ) 2 (n , + n 2 


~ 1 ) 


z 


( 12 . 15 . 1 ) 


The z of Equation 12.5.1 is compared for significance with tabulated values of 
the standard normal distribution of Appendix Table C. 


EXAMPLE 12.5.2 At a certain gasoline station, two grades of gasoline are available, 
A and B. On a certain day the first 50 gasoline purchases are of grades A and B 
in the following order: AA B AA B A BB AAA BB A BB A BB A BB A BB 
AA BBBB AA B A B A B AAA B AAAAA BB. Does this sequence of purchases 
appear to indicate a random selection of grades of gasoline? (Let a = 0.05.) 

We have n l = number of A’s = 26, n 2 = number of B’s = 24, and r — 28. 
Since n y and n 2 are both greater than 20, we cannot use Table J. Thus we use 
Equation 12.5.1 to compute 


= 0.58 

/2(26)(24)[2(26)(24) - 26 - 24] 

\] (26 + 24) 2 (26 + 24 - 1) 

Table C shows that a z of 0.58 is not significant at the a = 0.05 level of 
significance. We are therefore unable to reject the null hypothesis that the sequence 
of gasoline purchases is in random order. 


28 


/ 2(26) (24) 
\ 26 + 24 


+ 1 


Exercises 




12.5.1 In a large company, the last 15 promotions to top-level jobs were males and females 
in the following order: M FF MMMMM FFF MMM F. Test for randomness. Let a = 
0.05. 

12.5.2 A sample of 60 consecutively produced bolts is selected from an assembly line 
and measured. The following arc the deviations in thousandths of an inch of the lengths 
of the bolts from 3.000 inches: -5, 4, 2, -2, 3, 8, 4, 3, 3, 1, 4, 1, 1, 5, 3, -2, 6, 1, 
3, -11, -10, 12, 5, 3, 7, 8, -9, 3, 3, -2, 10, -1, 4, -5, 6, -3, 1, 5, 3, 5, 3, 
-1, -5, 3, -7, -4, 4, -2, -1, -2, -1, 10, -5, -5, 5, -2, 1, -7, 4, 4. 
Dichotomize the measurements as to whether they are above or below 3.000 inches. Test 
for randomness. Let a - 0.05. 

12.5.3 Figure 9.5.4 shows the residuals resulting from the regression analysis of Example 
9.4.1. They are -0.7, -11.5, + 6.1, + 13.3, -10.6, -4.2, + 15.2, -9.6, +7.5, 
-5.5. Can we conclude from these data that the residuals are independent? Use the runs 
test to test for randomness in the sequence of plus and minus signs. Randomness is 
compatible with independence. Let a = 0.05. 


12.6 THE WILCOXON SIGNED-RANK TEST FOR LOCATION 

Sometimes a business researcher wishes to test a null hypothesis about a population 
mean, but for some reason cannot use either z or t as a test statistic. The z statistic 
may be ruled out, for example, because the researcher has a small (less than 30) 
sample from a population that is known to be grossly nonnormally distributed. 
Therefore the central limit theorem is not applicable. The t statistic may not work 



well because the sampled population does not sufficiently approximate a normal 
distribution. In such a situation, a possible alternative is to use a nonparametric 
procedure. 

A nonparametric procedure that the researcher can often use for the one-sample 
case in which neither the z statistic nor the t statistic is appropriate is the Wilcoxon 
(1945) signed-rank test for location. This procedure is based on the following 
assumptions about*the data. 

1. The sample is random. 

2. The variable is continuous. 

3. The population is symmetrically distributed about its mean fi. 

4. The measurement scale is at least interval. 

Here are the null hypotheses about some unknown population mean /x 0 that 
may be tested, and their alternatives. 

(a) H 0 : fi = fiQ (b) H 0 : M — Mo (c) H 0 : M ^ Mo 
M ^ Mo H x : ix < fiQ H x \ fi > fi 0 


The calculations involved in applying the Wilcoxon procedure are as follows. 

1. Subtract the hypothesized mean Mo from each observation x h to obtain 

di = x, - Mo 

If any x { is equal to the mean, so that d t = 0, eliminate it from the calculations 
and reduce n accordingly. 

2 . Rank the d t from the smallest to the largest without regard to the sign of d r 
That is, consider only the absolute value of the d h designated by \d\, when ranking 
them. If two or more of the \d\ are equal, assign each tied value the mean of the 
rank positions the tied values occupy. If, for example, the three smallest \d f \ are 
all equal, place them in rank positions 1, 2, and 3, but assign each a rank of 
(1 + 2 + 3)/3 = 2. 

3. Assign each rank the sign of the d t that yields that rank. 

4. Find T+, the sum of the ranks with positive signs, and T _ , the sum of the 
ranks with negative signs. One of these is the test statistic, depending on the 
nature of the alternative hypothesis. To test for significance, enter Appendix Table 
K with the computed test statistic, the sample size n , and the chosen value of a. 
In Table K, the one-sided significance level is denoted by a' and the two-sided 
significance level by a". If the test is two-sided, reject H 0 at the a level of 
significance if either T + or T (whichever has the smaller absolute value) is 
smaller than d for n and tabulated a (two-sided). If the alternative hypothesis is 
H { \ fx < fjb 0 , reject H 0 at the a level of significance if T + is less than d for n and 
tabulated a (one-sided). If the alternative hypothesis is H x : fi > Mo» reject H 0 at 
the a level of significance if T_ is less than d for n and tabulated a (one-sided). 

Before we look at an example, let us consider the rationale underlying the 
Wilcoxon signed-rank test. Suppose that H 0 is true, that is, the population mean 



fjb is equal to the hypothesized mean (jl 0 . And suppose that the assumptions about 
the data are met. Then the probability of observing a positive difference d, of a 
given magnitude is equal to the probability of observing a negative difference of 
the same magnitude. For a given sample, then, if H 0 is true, we would expect 
T + and T_ to be about equal. Therefore, a sufficiently small value of T + or a 
sufficiently small value of T_ (depending on the alternative hypothesis) will cause 
us to reject H () . 

The following example illustrates the use of the Wilcoxon signed-rank test. 

EXAMPLE 12.6.1 A market analyst wants to know whether he can conclude, at the 
0.05 level of significance, that the mean annual family income in a certain low- 
income area is less than $15,000. Interviews with heads of household in a random 
sample of 20 families from the area yield the following incomes (in dollars per 
year): 8900, 9300, 10,100, 18,000, 10,300, 12,200, 7500, 9900, 11,200, 15,300, 
17,200, 23,000, 12,500, 15,100, 14,900, 14,300, 16,200, 13,900, 15,000, and 
18,000. The market analyst, who believes that the distribution of incomes in the 
area is symmetric, conducted the following hypothesis test. 

The statistical hypotheses are 

H 0 : fJL > $15,000 H } : p < $15,000 

Table 12.6.1 gives the calculation of the test statistic. 

Reference to Table K, with n = 19 and a' - 0.052, reveals that, since 57.5 
is larger than 55, we cannot reject H 0 at the 0.05 level. In fact, for this test, p > 


TABLE 12.6.1 
Calculation of the 
test statistic for 
Example 12.6.1 


Annual family income (x,) 

dj = X/ - yoo 

Rank of \d;\ 

Signed rank of ]cf/| 

8,900 

-6,100 

17 

-17 

9,300 

-5,700 

16 

-16 

10,100 

-4,900 

14 

-14 

18,000 

+ 3,000 

10.5 

+ 10.5 

10,300 

-4,700 

13 

-13 

12,200 

-2,800 

9 

-9 

7,500 

-7,500 

18 

-18 

9,900 

-5,100 

15 

-15 

11,200 

-3,800 

12 

-12 

15,300 

+ 300 

3 

+ 3 

17,200 

+ 2,200 

7 

+ 7 

23,000 

+ 8,000 

19 

+ 19 

12,500 

-2,500 

8 

-8 

15,100 

+ 100 

1.5 

+ 1.5 

14,900 

- 100 

1.5 

-1.5 

14,300 

- 700 

4 

-4 

16,200 

+ 1,200 

6 

+ 6 

13,900 

-1,100 

5 

-5 

15,000 

0 

Eliminate from analysis 


18,000 

+ 3,000 

10.5 

+ 10.5 

T + = 57.5 

The test statistic is T+ 

= 57.5 


T_ = 132.5 


0.052. We conclude, then, that the mean annual family income in the area may 
be $15,000 or more. 

12.6.1 Sixteen laboratory animals were fed a special diet lfom birth through age twelve 
weeks. Their weight gains (in grams) were as follows: 64, 69, 80, 66, 65, 64, 66, 65, 

77, 75, 67, 67, 68, 74, 70, 77. Can we conclude from these data that the diet results in 
a mean weight gain of less than 70 grams? Let a ~ 0.05, and find the p value. 

12.6.2 A psychologist selects a random sample of 25 handicapped assembly-line workers 
from among those employed at several factories of a large industry. Their manual dexterity 
scores were as follows: 32, 52, 21, 39, 23, 55, 36, 27, 37, 41, 34, 51, 51, 35, 46, 40, 
31, 19, 41, 33, 52, 36, 34, 46, 41. Do these data provide sufficient evidence to indicate 
that the mean score for the population is not 45? Let a — 0.05. Find the p value. 

12.6.3 A population of adolescent laborers who dropped out of high school at age 16 had 
a mean reading comprehension score of 60. A random sample of 21 adolescents who were 
still in school at age 16 made the following scores on the same test: 72, 62, 52, 57, 91, 

78, 74, 67, 51, 62, 84, 59, 51, 57, 89, 64, 80, 72, 92, 64, 57. Do these data provide 
sufficient evidence to indicate that the mean score for adolescents still in school at age 16 
is greater than that for dropouts employed as laborers? Let a = 0.05. Find the p value. 


12.7 THE MANN-WHITNEY TEST 


It is not unusual for researchers to have to test a null hypothesis about the differ¬ 
ence between two location parameters under conditions that render both z and t 
inappropriate as test statistics. In such situations, researchers usually look for an 
appropriate nonparametric procedure. When the objective is to test for a significant 
difference between two location statistics computed from independent samples, 
the nonparametric procedure most often used is the Mann-Whitney test , proposed 
by Mann and Whitney (1947). Sometimes, when the test statistic is computed by 
a formula different from (but related to) the one we give here, the procedure is 
called the Mann-Whitney U test. 

The test focuses on the median as the measure of location or central tendency. 
Recall that when a population is symmetric, the median and the mean are equal. 
Therefore, when the two sampled populations are symmetric, conclusions about 
their medians based on the Mann-Whitney test also apply to their means. 

Let M x — the median of population 1 and M Y — the median of population 2. 
The following are the null hypotheses that may be tested, along with their alter¬ 
natives. 


(a) 

H 0 : 

M x 

= 

My 


H { . 

M x 


My 

(b) 

Ho- 

M x 

> 

My 


H { . 

M x 

< 

My 

(c) 

H 0 : 

M x 

< 

My 


H 

M x 

> 

My 


Two-sided 


One-sided 


One-sided 



The following are the assumptions underlying the Mann-Whitney test. 

1. The samples have been randomly and independently drawn from their respec¬ 
tive populations. Let x lf x 2 , . . . , jc Wi represent the sample values drawn from 
population 1, and let y x , y 2 , . . . , y„ 2 represent the sample values drawn from 
population 2. 

2. The variable of interest is continuous. 

3. The measurement scale used is at least ordinal. 

4. The distributions of the two sampled populations, if they differ at all, differ 
only with respect to location. 

If the measurement scale for each sample is ordinal, you must be able to rank 
the observations of one sample with those of another when the two samples are 
combined as described below. In practice, you may need an interval scale in order 
for this to be possible. 

To compute the Mann-Whitney test statistic, combine the two samples and rank 
all sample observations from smallest to largest. Assign tied observations the mean 
of the rank positions they would have occupied had there been no ties. Then sum 
the ranks of the observations from population 1 (that is, the X’s). If the location 
parameter of population 1 is smaller than that of population 2, we expect (for 
equal sample sizes) the sum of the ranks for population 1 to be smaller than that 
for population 2. Similarly, if the location parameter of population 1 is larger than 
the location parameter of population 2, we expect the reverse to be true. The test 
statistic is based on this rationale. Depending on the null hypothesis, either a 
sufficiently small or a sufficiently large sum of ranks assigned to sample obser¬ 
vations from population 1 causes us to reject the null hypothesis. 

The test statistic is 


T = S - 


ftifoi + 1) 
2 


(12.7.1) 


where S is the sum of the ranks assigned to the sample observations from popu¬ 
lation 1. 

The appropriate decision rules for an a level of significance are as follows. 

1. When we test H 0 : M x = M Y , we reject H 0 for either a sufficiently small or a 
sufficiently large value of T. Therefore, we reject H 0 if the computed value of T 
is less than W a/2 or greater than _ a/2 , where W a/2 is the critical value of T 
given in Appendix Table L and W l _ a/2 is given by 

Wl-a/2 = »1«2 - W a/2 (12.7.2) 

2. When we test H 0 : M x > M Y , we reject H 0 for sufficiently small values of T. 
We reject H 0 if the computed T is less than W a9 the critical value of T given in 
Table L for n u n 2 , and a. 

3. When we test H 0 : M x < M y , we reject H 0 for sufficiently large values of T. 
Therefore, we reject H 0 if the computed T is greater than W X - a9 where 


Wi ~ a = n { n 2 - W t 


( 12 . 7 . 3 ) 


In all cases, when we reject H n , we conclude that H, is true. If we fail to reject 
H 0 , we conclude that H a may be true. The following example illustrates the use 
of the Mann-Whitney test. 


EXAMPLE 12.7.1 A researcher gives a random sample of 15 college men and an 
independent random sample of 20 college women a test to measure their knowl¬ 
edge of ecological issues. Table 12.7.1 shows the scores. We wish to know 
whether we can conclude on the basis of these data that the two populations of 
scores are different with respect to their medians. Let a = 0.05. 

The statistical hypotheses are: 

H 0 \ M x ■— M y , H } : M x M y 

The researcher is unwilling to assume that the populations are approximately 
normally distributed. Thus t is not the appropriate test statistic. Since the sample 
sizes are small, one can’t apply the central limit theorem. Therefore z is not a 
valid test statistic. We presume that the assumptions for the Mann-Whitney test 
are met and use that procedure. 

Table 12.7.2 shows the scores of Table 12.7.1 in rank order, with the ranks 
attached. We see in Table 12.7.2 that S — 186. By Equation 12.7.1, then, the 
test statistic is 


T = 


186 - 


15(15 + 1) 
2 


66 


From Table L, W a/2 = 91 for n l = 15, n 2 = 20, and a/2 = 0.025. By 
Equation 12.7.2, we compute 

W\- a /i = 15(20) - 91 = 209 

Since the computed value of the test statistic, T = 66, is less than 91, we re¬ 
ject H 0 . 

When the null hypothesis is true, the sampling distribution of the Mann-Whitney 
test statistic is symmetric. Since this is the case, we can find the two-sided p value 
by doubling the one-sided p value. We consult Table L for n [ = 15 and n 2 - 
20. We find that the computed value of our test statistic, 66, is between 60 and 
74. Consequently, for this test, 

2(0.005) >p> 2(0.001) or 0.010 >p> 0.002 


TABLE 12.7.1 
Data for Example 
12.7.1 


Men's scores, X 


18.50 

17.00 

12.40 

14.00 

16.00 

15.20 

20.00 

12.50 

12.50 

19.00 

12.00 

19.25 

19.50 

10.00 

11.00 


Women's scores, Y 


25.00 

19.10 

15.00 

18.00 

23.00 

18.75 

21.00 

18.25 

16.20 

21.10 

18.50 

24.00 

19.75 

17.50 

17.25 

18.30 

20.00 

17.75 

16.30 

19.20 


TABLE 12.7.2 
Data of table 
12.7.1 in rank 
order, with ranks 
attached 


Men's scores, X 

Ranks 

Women's scores, Y 

Ranks 

10.00 

1 



11.00 

2 



12.00 

3 



12.40 

4 



12.50 

5.5 



12.50 

5.5 



14.00 

7 

15.00 

8 

15.20 

9 



16.00 

10 

16.20 

11 



16.30 

12 

17.00 

13 

17.25 

14 



17.50 

15 



17.75 

16 



18.00 

17 



18.25 

18 



18.30 

19 

18.50 

20.5 

18.50 

20.5 



18.75 

22 

19.00 

23 

19.10 

24 



19.20 

25 

19.25 

26 



19.50 

27 

19.75 

28 

20.00 

29.5 

20.00 

29.5 



21.00 

31 



21.10 

32 



23.00 

33 



24.00 

34 



25.00 

35 


Total = S = 186 




Finding the p value in Table L can sometimes be tricky. The following example 
will help clarify the procedure. 

Consider the two-sided test in which the computed value of the test statistic 
exceeds the largest value in Table L for a/2, n x and n 2 . Suppose that in our 
example the computed value of the test statistic had been 240. Since 240 exceeds 
W ] _ a/2 = (15)(20) - 91 = 209, we would reject H 0 . To find the one-sided p 
value, we would compute T' = n x n 2 — T 0 , which for this example is T = 
(15)(20) — 240 = 60. Since the probability, when the null hypothesis is true, of 
obtaining a value of the test statistic as small as 60 is 0.001, the one-sided p value 
is 0.001. The two-sided p value, then, would be 2(0.001) = 0.002. In the one¬ 
sided test in which the alternative hypothesis is M x > M y , you may have to 
compute T in order to determine the p value. 

Since, for our present example, we reject H 0 , we conclude that H x is true. That 
is, we conclude that the sampled populations of college men and women differ, 
on the average, with respect to knowledge of ecological issues. 


12.7.1 A firm wishes to compare two methods of communicating information about a new 
product. Two groups of subjects arc chosen to take part in the experiment. Subjects in the 
first group learn about the new product by Method A. Subjects in the second group learn 
about it by Method B. At the end of the experiment, each subject is given a test to measure 
knowledge of the new product. The results are shown in the following table. Do these 
data provide sufficient evidence to indicate a difference in median scores among the two 
groups? Let a = 0.05. Determine the p value. 

_k_ 

Method A scores 50 59 60 71 80 81 80 78 72 77 73 75 75 77 76 

Method B scores 52 54 58 78 65 69 61 60 72 60 59 65 69 68 65 

12.7.2 The following table gives the Brinell hardness numbers of specimens in random 

samples from two competing potential raw materials for a certain product. Can an inves¬ 
tigator conclude that Material B has a higher Brinell hardness number, on the average, 
than Material A? 


A 160 162 165 171 162 170 168 165 166 172 160 162 168 171 170 

B 167 168 170 172 174 168 171 170 172 171 172 175 172 168 163 169 

12.7.3 The following table shows the monthly salaries of independent samples of 20 men 
and 20 women who do the same type of work. Do these data suggest that there is a 
difference in the median salaries for men and women doing this particular job? Let a ~ 
0.05. Determine the p value. 


Men 


Women 
_ Jl 


$818 

$954 

942 

946 

963 

881 

893 

788 

819 

863 

941 

891 

935 

749 

865 

847 

840 

902 

973 

965 


$841 

$886 

795 

955 

887 

983 

836 

970 

892 

894 

875 

877 

960 

763 

934 

767 

771 

961 

715 

800 


12.7.4 A team of industrial psychologists draws a sample of the records of those applicants 
for a certain job who have completed high school. They select an independent random 
sample of the records of applicants for the same job who were high school dropouts. The 
following table shows the emotional maturity test scores of the applicants in the two groups. 
Do these data provide sufficient evidence to indicate that the two sampled populations 
have different medians? Let a = 0.05. Determine the p value. 


High school graduates High school dropouts 


89 

79 

62 

85 

85 

72 

65 

97 

56 

63 

67 

59 

78 

51 

69 

82 

96 

56 

85 

47 

57 

71 

72 

94 

77 

58 

49 

63 

67 

79 

62 


66 

54 

41 

64 

78 

69 


65 

74 

58 

86 

83 

57 


64 

49 

67 





12.8 THE SIGN TEST 


We perform the Mann-Whitney test, presented in Section 12.7, with data from 
two independent samples. We often want to analyze two sets of data that are not 
the results of two independently drawn samples. The data may be “before and 
after” scores for the same subject or scores for matched subjects that have been 
treated in different ways. We speak of such data as data from wo related samples. 
When the necessary assumptions are met, we can analyze the data of two related 
samples by the parametric paired comparisons test that is used to test null hy¬ 
potheses about the mean difference in the two sets of observations. (See Chap¬ 
ter 7.) 

When the assumptions underlying the parametric paired comparisons test are 
not met, or when the observations are based on a weak measurement scale, we 
must use an alternative test. A simple nonparametric test that we can use is the 
sign test, which focuses on the median, rather than the mean, as a measure of 
central tendency. As we have seen, the median and mean coincide in symmetric 
distributions. Specifically, we use the sign test to test hypotheses about median 
differences, where we obtain differences by comparing various pairs of observa¬ 
tions. Examples would be “before and after” scores made by the same subject 
or the two scores made by paired subjects. Let us call the values of one set of 
scores X, and the values of the other set Y t . We observe the differences X { - T,. 
If Xj > Y t , we record the difference as + . If X, < Y h we record the difference 
as —. The test uses the resulting sample of pluses and minuses. 

Perhaps the most common use of the sign test is to test the null hypothesis that 
the median difference is 0. This may be stated more compactly as follows: 

H 0 : P(X, > Y,) = P(X, < y,-) = 0.5 

The null hypothesis states that a positive difference is as likely to occur as a 
negative one. In a random sample of pluses and minuses obtained by computing 
(X; - Yj) for each pair of observations, then, we would expect about as many 
pluses as minuses when the null hypothesis is true. Alternatively, we can state 
the null hypothesis as 

H 0 : P( + ) = P(-) = 0.5 

We can think of obtaining a series of pluses and minuses in this manner as a 
binomial experiment with parameters n and p , where n is the number of pairs of 
observations and p = 0.5. 

The test statistic for the sign test is either the observed number of plus signs or 
the observed number of minus signs. The nature of the alternative hypothesis 
determines which of these test statistics is appropriate. 

In a given test, any one of the following alternative hypotheses is possible: 

Hp.P( +) > P( —) one-sided alternative 
H x P( + ) < P(-) one-sided alternative 
Hp. P( + ) t- P( —) two-sided alternative 


If the alternative hypothesis is 

P( + )> P(-) 

a sufficiently large number of plus signs causes rejection of H 0 . The test statistic 
is the number of plus signs. Suppose that the chosen level of significance is a. 
We reject H Q if the probability (when H 0 is true) of obtaining as many or more 
plus signs than we actually obtain is equal to or less than a. Thus if a = 0.05 
and we observe 7 plus signs, we reject H 0 if the probability (when H 0 is true) of 
obtaining 7 or more plus signs is equal to or less than 0.05. 

Similarly, if the alternative hypothesis is 

H x \ P( + ) < P(-) 

a sufficiently large number of minus signs causes rejection of H 0 . The test statistic 
is the number of minus signs. 

If the alternative hypothesis is 

H X :P{ + )*P(~) 

either a sufficiently large number of plus signs or a sufficiently large number of 
minus signs causes rejection of the null hypothesis. We may take as the test 
statistic the more frequently occurring sign. For a two-sided test with a significance 
level a , we reject H 0 if the probability, when H 0 is true, of obtaining as many or 
more of the more frequently occurring sign than we actually obtained is equal to 
or less than a/2. 

If a difference (X, - Y/) is equal to 0, we eliminate the pair from the sample 
and reduce the value of n accordingly. 

EXAMPLE 12.8.1 The following experiment is designed to compare the effective¬ 
ness of two detergents in cleaning cotton fabric. Twelve pieces of fabric are 
uniformly soiled and then cut in half. One half of each piece is randomly assigned 
to be washed in Detergent A. The other half is washed in Detergent B. After the 
fabric specimens have been washed and dried, each piece is tested to determine 
the effectiveness of the detergent. We wish to know whether we can conclude, at 
the 0.05 level of significance, that the median difference is negative. The results 
are shown in Table 12.8.1. 

The hypotheses are as follows. 

H 0 : The median of the differences is 0 [P( +) = P( -)] 

H x : The median of the differences is negative [/*( + ) < P( —)] 

Since the alternative hypothesis is P( + ) < P( —), we have a one-sided test. 
The test statistic is the number of minus signs. Table 12.8.1 shows one 0, which 


TABLE 12.8.1 
Results of 
experiment 
described in 
Example 12.8.1 


Specimen 123 4 5 6 7 89 10 11 12 

Detergent A(X) 9879777 87 97 8 

Detergent B{Y) 8 10 88998 10 9 98 9 

Sign of (X, 4- 0 - 


we eliminate from the analysis, and 9 minus signs. We wish to know the proba¬ 
bility of obtaining 9 or more minus signs when the probability of a minus sign is 
0.5. That is, we wish to determine P(k > 9|ll, 0.5), where k is the test statistic, 
the number of minus signs. Appendix Table A shows this probability to be 0.0327. 
Since 0.0327 is less than 0.05, we reject H 0 and conclude that the median of the 
differences is negative. The p value is 0.0327. 

For samples of size 11 or larger, we can use the normal approximation to the 
binomial (recall Chapters 5, 6, and 7). The transformed test statistic is 

(ik ± 0.5) - 0.5/i 

Z - -7=- (12.8.1) 

0.5 Vn 

where k = the original test statistic, the number of plus or minus signs, whichever 
is appropriate. In Equation 12.8.1, k + 0.5 is used when k < n/ 2, and k - 0.5 
is used when k > n/2. We compare the computed z with the appropriate z value 
from the standard normal distribution for significance. 

The following example illustrates the use of Equation 12.8.1 with a large sam¬ 
ple. 


EXAMPLE 12.8.2 To investigate the effectiveness of different kinds of advertising, 
the market research department of a chain of discount stores conducts the following 
experiment in a random sample of 15 stores. During a certain week, automotive 
department specials are advertised by periodic announcements over the stores’ 
loudspeaker systems. During the following week, automotive department specials 
are advertised by window displays and other storewide visual advertising. The 
variable of interest is total dollar sales volume for the automotive department 
during the week. Table 12.8.2 shows the results. 

The hypotheses are as follows. Let a = 0.05. 


H 0 : The median of the differences is 0 
Hi : The median of the differences is not 0 


[/>( + ) = P(~)] 
[P( + )*P(-) ] 


TABLE 12.8.2 

Dollar volume 

Store 

number 

Announcements (X) 

Visual displays (/) 

Sign of (X ; - Y,) 

of sales for 

1 

$4,1 27 

$4,147 


automotive 

2 

4,288 

4,048 

+ 

departments of 

3 

4,024 

4,853 


15 discount stores 

4 

3,627 

4,865 

- 

under two 

5 

4,813 

4,376 

+ 

experimental 

6 

7 

3,925 

4,840 

4,838 

3,526 

+ 

advertising 

8 

3,731 

4,300 


conditions 

9 

3,779 

4,672 

— 


10 

4,229 

4,721 

- 


11 

3,977 

4,770 

— 


12 

3,778 

4,484 

_ 


13 

3,602 

4,389 

- 


14 

3,959 

4,560 

- 


15 

4,918 

3,848 

+ 


By Equation 12.8.1, we compute 


(11 - 0.5) - 0.5(15) 
0.5 vT5 


1.55 


Since the computed value of z = 1.55 < 1.96, we do not reject the null 
hypothesis. We conclude that the two methods of advertising may have equal 
effects [p = 2(0.0606) = 0.12121. 

[The sign test is discussed in detail in an article by Dixon and Mood (1946).] 
We can use the Wilcoxon signed-rank test with paired data when the population 
of sampled differences satisfies the test’s assumptions. After we find the signed 
difference between each pair of observations, we proceed just as we do for the 
single-sample case. An advantage of the Wilcoxon test over the sign test is the 
fact that the Wilcoxon test uses more of the information inherent in the data than 
the sign test does. However, if the sampled population is not symmetric, the sign 
test is preferable. 


12.8.1 A firm wants to study the effect of piped-in music on productivity of employees. 
One department of a certain factory is selected at random to receive piped-in music for 30 
working days. There are 10 employees in the department. The following table shows the 
average daily output for 30 days before the introduction of music and the average daily 
output for the 30 days during which music is piped into the department. Can we conclude 
from these data that music increases productivity? Let a = 0.05. Determine the p value. 

Employee ABCDEFGH / J 

Before music 90 80 92 85 81 85 72 85 70 88 

During music 99 85 98 83 88 99 80 91 80 94 

12.8.2 In a study designed to test the effect of packaging on consumer acceptance of a 
certain candy bar, 27 candy bars are wrapped in a colorful wrapper (Method A). Also 27 
identical candy bars are wrapped in a plain wrapper (Method B). Twenty-seven subjects 
are asked to eat each of the bars and indicate their preference. The following table shows 
the results. Let the event “A preferred over B” be designated by a plus, and the event 
“B preferred over A” by a minus. Test to see whether the data provide sufficient evidence 
to indicate that the candy bar packaged by Method A is preferred over the bar in the plain 
wrapper. Let a = 0.05. Determine the p value, (n.p. = no preference) 


Subject 

Bar preferred 

Subject 

Bar preferred 

Subject 

Bar preferred 

Subject 

Bar preferred 

1 

A 

8 

A 

15 

n.p. 

22 

A 

2 

A 

9 

A 

16 

A 

23 

A 

3 

A 

10 

A 

17 

A 

24 

A 

4 

B 

11 

A 

18 

A 

25 

A 

5 

B 

12 

B 

19 

A 

26 

A 

6 

n.p. 

13 

A 

20 

A 

27 

B 

7 

A 

14 

A 

21 

B 




12.8.3 A consumer affairs investigator conducts an experiment to probe the possibility of 
differential pricing by retail stores in a metropolitan area. At different times two subjects 
visit 12 retail stores in which prices are not posted. One subject projects the image of a 





member of the upper socioeconomic stratum of the area. The second subject projects the 
image of a member of the low socioeconomic stratum of the area. In 10 instances the 
upper socioeconomic subject is quoted a lower price. Do these data provide sufficient 
evidence to indicate that more than 50% of the sampled firms practice differential pricing? 
Let a — 0.05, and determine the p value. 


12.9 THE KRUSKAL-WALLIS ONE-WAY ANALYSIS OF 
VARIANCE BY RANKS 


Chapter 8 showed that by means of analysis of variance, we may test the null 
hypothesis that several population means are equal. The simplest application of 
this technique is the use of one-way analysis of variance. However, the use of 
this technique rests on the assumptions that the sampled populations are normally 
distributed with equal variances. 

When either of these assumptions is not met, we need an alternative test. 
Perhaps the most widely used nonparametric alternative is the Kruskal-Wallis one¬ 
way analysis of variance by ranks (1952), which uses ranks rather than original 
observations. (If the original observations themselves are ranks, they are used.) 
You use the Kruskal-Wallis test only if the samples are independent. 

We replace the observations by ranks from 1, corresponding to the smallest 
observation, to n, corresponding to the largest observation in the combined set of 
data. In the event of ties, we replace each tied observation by the mean of the 
ranks for which it is tied. We compute the following statistic: 


12 k Rf 

H = 2 — - 3 (« + 1 ) 

n(n + 1) /= 1 n 


where k = the number of samples, n t = the number of observations in the yth 
sample, n = 2/?,, the total number of observations in all samples, and R f = the 
sum of the ranks in the yth sample. 

A large value of H tends to cast doubt on the null hypothesis that the k samples 
are drawn from identically distributed populations. When there are only three 
groups and five or fewer observations in each group, we determine the significance 
of the statistic H by referring to Appendix Table M. For values of /i, and k not 
included in Table M, we compare the computed value of H for significance with 
tabulated values of x 2 with k - 1 degrees of freedom. [Gabriel and Lachenbruch 
(1969) discuss the adequacy of the x 2 approximation when the sample sizes are 
small.] 

If there are ties, we can adjust H by dividing by 

, 2T 

1-r- (12.9.2) 

n — n 


where T = t 3 — t. The t designates the number of tied observations in a group 
of tied observations. The effect of the adjustment is to inflate H, so that if the 
unadjusted H is significant, the adjustment is not necessary. 



EXAMPLE 12.9.1 Researchers conduct an experiment to determine whether different 
methods of producing a certain molded part result in different mean tensile strengths. 
Since the researchers are unwilling to make the assumptions necessary for the 
parametric one-way analysis of variance, they use the Kruskal-Wallis test. Table 
12.9.1 shows the tensile strengths of the parts produced by the different methods, 
with the ranks in parentheses. 

The value of H that we may compute by Equation 12.9.1 is 


H - 


12 

33(33 + 1) 


(136) 2 

7 


+ 


(236) 2 

8 


+ 


Q32) 2 (5T) 2 

8 10 


~ 3(33 + 1) = 27.49 


Appendix Table F shows that with 3 degrees of freedom, the computed value of 
H is significant. Consequently we can reject the hypothesis that the methods yield 
equal mean tensile strengths (p < 0.005). 

Since the computed H is significant, we would not in practice go to the trouble 
of adjusting for the ties that occurred in the assigning of ranks. To demonstrate 
the method, however, we shall compute an adjusted H. There were seven groups 
of ties. In four groups, two observations are tied. In one group, three are tied. In 
two groups, four are tied. From these ties, we compute 

2T = 4(2 3 - 2) + (3 3 - 3) + 2(4 3 - 4) = 168 
so that the correction factor is 


_ 168 
33 3 - 33 

and the computed statistic is 

= 27.49 
c 0.9953 


0.9953 


27.62 


To understand the use of Table M, consider the following example. 


EXAMPLE 12.9.2 We wish to know whether we can conclude that three types of 
fertilizer have different effects on the mean yield in bushels per acre of a certain 


TABLE 12.9.1 
Tensile strengths 
(psi) of molded 
parts produced by 
four different 
methods 



B 

C 

D 

80(10.5) 

99(32.5) 

89(24) 

76(5.5) 

88(22.5) 

91(26) 

82(14) 

77(7) 

87(21) 

98(30) 

81(13) 

75(3.5) 

86(18.5) 

98(30) 

80(10.5) 

78(8) 

90(25) 

99(32.5) 

86(18.5) 

76(5.5) 

88(22.5) 

96(28) 

86(18.5) 

73(2) 

85(16) 

92(27) 

86(18.5) 

71(1) 


98(30) 

84(15) 

80(10.5) 

75(3.5) 

80(10.5) 

/?, = 136 

R 2 = 236 

R 3 = 132 

R a = 57 





TABLE 12.9.2 
Yields (bu/acre) of 
grain receiving 
three types of 
fertilizer 


A 

B 

C 

45 (6) 

42 (3) 

53 (9) 

40(1) 

44 (5) 

56 (12) 

41 (2) 

43 (4) 

54(10) 

46 (7) 

47 (8) 

55 (11) 

R, = 16 

R 2 - 20 

R 3 = 42 


grain. Each of the three types of fertilizer is applied to four one-acre plots of 
ground. These plots are as alike with respect to relevant variables as possible. 
The plots are all treated alike during the growing season. Table 12.9.2 shows the 
yields of the twelve plots. 

From the data in Table 12.9.2, we compute 


H = 


12 

12(12 + 1 ) 


(16) 2 + (20) 2 + (42) 2 
4 


- 3(12 + 1) = 7.54 


From Table M, the probability of obtaining a value of H as large as 7.54 when 
the samples are drawn from identical populations is 0.011. Thus we may reject a 
null hypothesis of equal treatment means at the 0.05 level of significance. We 
conclude that the fertilizers have different effects (p - 0.011). 


Exercises 



12.9.1 A subject is asked to rank 15 samples of coffee in order of preference from least 
preferred (1) to most preferred (15). Unknown to the subject, the 15 samples consist of 5 
samples of each of 3 brands. The following table shows the rankings by brand. Test the 
null hypothesis that the three brands are equally preferred. Let a = 0.05. Determine the 
p value. 



Brand A 9 10 11 12 13 

Brand B 14 1 5 7 8 

Brand C 2 3 4 15 6 

12.9.2 A manufacturer of cake mixes wants to compare four new formulas for cake mix. 
Five cakes are baked from each of the four formulas. A panel of judges, unaware of the 
differences in formulas, gives each cake a score as shown in the following table. Test the 
null hypothesis of no difference among cake mixes. Let a = 0.05. Determine the p value. 



Formula A 

12 

88 

70 

87 

71 

Formula B 

85 

89 

86 

82 

88 

Formula C 

94 

94 

88 

87 

89 

Formula D 

91 

93 

92 

95 

94 


12.9.3 Utility company officials wish to compare the bill-paying habits of customers living 
in four neighborhoods. They analyze random samples of bills from each of the four neigh¬ 
borhoods to determine the number of days between the date the bill is mailed and the date 
payment is received. The results arc as shown in the following table. Do these data provide 
sufficient evidence to indicate that the bill-paying habits of the populations differ? Let 
a — 0.01. Determine the p value. 


Community A 

27 

17 

21 

26 

25 

18 

17 

23 

17 

23 

Community B 

13 

13 

14 

11 

13 

13 

9 

13 

14 

10 

Community C 

11 

13 

19 

13 

16 

13 

12 

17 

19 

16 

Community D 

7 

13 

14 

9 

12 

8 

12 

12 

9 

13 


12.9.4 Researchers run an experiment to evaluate the effectiveness of three different meth¬ 
ods of teaching problem solving. The following tabic shows, by treatment, the 15 par¬ 
ticipating subjects ranked on the basis of ability to solve problems after training. Do these 
data provide sufficient evidence to indicate that the three teaching methods differ in effec¬ 
tiveness? Let a = 0.05. 


Method A 5 7 9 3 6 

Method B 4 1 8 2 10 

Method C 15 13 11 12 14 

12.9.5 A manufacturer conducted a study to compare the characteristics of assembly-line 
employees. Employees were categorized into three performance groups: high, average, 
and low. Researchers selected a random sample from each group and interviewed and 
tested them in depth. The following table shows the self-concept scores of subjects in the 
three groups. Do these data provide sufficient evidence to indicate that the population 
groups differ with respect to median level of self-concept? Let a = 0.05, and find the p 
value. 


Low performers 

50 

60 

58 

63 


Average performers 

81 

87 

85 



High performers 

96 

90 

94 

99 

90 


12.10 THE FRIEDMAN TWO-WAY ANALYSIS 
OF VARIANCE BY RANKS 

The Kruskal-Wallis test (Section 12.9) is appropriate when the samples are in¬ 
dependently drawn from their respective populations. But we often want to analyze 
data from nonindependent, or related, samples. For two dependent samples, the 
sign test is an appropriate nonparametric method for testing the null hypothesis 
that the median difference is 0. When we wish to test the null hypothesis of no 
difference in treatment effects among three or more related samples, we often use 
the Friedman two-way analysis of variance by ranks [see Friedman (1937, 1940)]. 

This is called a two-way analysis of variance because it provides a nonpara¬ 
metric analogue to the parametric two-way analysis-of-variance technique used to 
analyze data from a randomized complete block experiment. The data are cast in 
a two-way table in which the rows correspond to the blocks and the columns 
correspond to the treatments. Each block may represent a different individual 
subjected to several experimental conditions. Or it may represent a different batch 
of some material, portions of which are treated in a different manner. 

The original observations may consist of ranks. For example, the blocks may 
represent subjects who rank several different machines (the treatments) in order 
of preference. If the original observations consist of scores, we convert them to 



TABLE 12.10.1 
Five technicians' 
rankings of three 
calculators (A, B, C) 


ranks. We rank the observations in each of the k blocks separately, from 1, which 
is assigned to the smallest observation in the block, to k t which is assigned to the 
largest in the block. We assign tied observations the average of the ranks for 
which they are tied. If the null hypothesis is true—that is, if there is no difference 
in treatment (column) effects—the assignment of ranks to the columns will be the 
result of chance factors. On the other hand, if there are differences, we would 
expect a preponderance of large or small ranks in at least one of the columns. 

The Friedman statistic measures the extent to which the ranks within the col¬ 
umns depart from randomness by focusing on the sum of the ranks in each column. 
The test statistic, denoted by Friedman as xh is 

= mT+v>% Rj " Mk + 1} (12 - 101) 

where n is the number of blocks (rows), k is the number of treatments (columns), 
and Rj is the sum of the ranks in the yth column. A large value of x 2 reflects a 
large difference among the rank sums. It tends to cast doubt on the hypothesis of 
equal treatment effects. To determine whether or not a computed value of x 2 is 
large enough to cause rejection of the null hypothesis, we can consult Appendix 
Table N when n and k are small. For values of k and n not given in Table N, we 
can compare the computed x 2 for significance with tabulated values of x 2 with 
k — 1 degrees of freedom. 

EXAMPLE 12.10.1 Five technicians want to rank three calculators in order of pref¬ 
erence. A rank of 1 indicates first preference. The null hypothesis to be tested is 
that there is no difference in preference for the three calculators. The alternative 
is that the three are not equally preferred. Table 12.10.1 shows the results. 

From the data in Table 12.10.1, we may compute 

X 2 r = 5(3)( * 2 + f(6) 2 + (10) 2 + (14) 2 ] - 3(5)(3 + 1) = 6.4 

Table N indicates that the probability of obtaining a value of x 2 as large as 6.4 
when the null hypothesis is true is 0.039. We may conclude, then, that the three 
calculators are not equally preferred (p = 0.039). 


Technician 


A 


B 


C 


1 

2 

3 

4 

5 


1 

1 

2 

1 

1 

6 


2 

2 

1 

2 

3 

10 


3 

3 

3 

3 

2 

14 




TABLE 12.10.2 
Health ratings of 
45 puppies fed 
different formulas 
(A, B, C) 


Litter 

A 

B 

C 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

3 3 3 2.5 1 3 3 2 2.5 3 2 3 3 2 2 

2 1 2 2.5 3 2 2 3 2.5 1 3 1 1 3 3 

121121111212211 


Rj 

38 

32 

20 


In the following example, the samples are large. 

EXAMPLE 12.10.2 A puppy-food manufacturer wants to compare the effects of 
three formulas on the health of young puppies. The subjects consist of 3 litter 
mates of the same sex from each of 15 litters. One litter mate from each set is 
fed Formula A, one is fed Formula B, and the third is fed Formula C. At the end 
of the experimental period, a veterinarian rates the health of the puppies. The 
manufacturer wishes to know whether she can conclude on the basis of these 
results, shown in Table 12.10.2, that the three formulas have different effects. 
From the data in Table 12.10.2, we may compute 

X 2 r = - 1 5 ( 3 )( 1 3 2 - + l(38) 2 + (32) 2 + (20) 2 ] - 3(15)(3 + 1) = 11.2 

Table F reveals that we can reject the null hypothesis of equal treatment effects, 
since 11.2 > 5.991. We conclude, then, that at least one formula is better than 
at least one of the others (p < 0.005). 


Exercises 



12.10.1 Fifteen management trainees with a large company are asked to rank five U.S. 
cities in order of preference as a place for permanent assignment. The results are as shown 
in the following table. Test the null hypothesis that the five cities are equally preferred. 
Let a — 0.05. Determine the p value. 


Trainee no. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

City A 

2 

3 

3.5 

2 

3.5 

3 

5 

5 

2 

2 

3 

4 

1.5 

5 

5 

City B 

3 

2 

3.5 

1 

3.5 

4 

1 

3 

5 

5 

5 

2 

1.5 

4 

4 

City C 

1 

4 

5 

3 

1 

1 

4 

1 

1 

3 

4 

1 

3 

3 

2 

City D 

4 

5 

2 

4 

5 

5 

3 

2 

4 

1 

2 

3 

4 

2 

3 

City E 

5 

1 

1 

5 

2 

2 

2 

4 

3 

4 

1 

5 

5 

1 

1 



12.10,2 A textile manufacturer is considering three dye formulas for a certain synthetic 
fiber. He wishes to know whether he can conclude that the three do in fact differ in quality. 
To aid in his decision, he conducts an experiment in which five specimens of fabric are 
cut into thirds. One third is randomly assigned to be dyed by each of the three dyes. Each 
piece of fabric is later graded and assigned a score measuring the quality of the dye. The 
results are as follows. Test the null hypothesis that the dyes are of equal quality. Let 
a = 0.05. Determine the p value. 


Fabric specimen 

1 

2 

3 

4 

5 

Dye A 

74 

78 

76 

82 

77 

Dye B 

81 

86 

90 

93 

73 

Dye C 

95 

99 

90 

87 

93 







12.10.3 Ten subjects suffering from arthritis took part in an experiment to evaluate the 
relative effectiveness of three pain-relieving drugs. The following table shows how the 
subjects ranked the three drugs in order of preference as pain relievers. Do these data 
provide sufficient evidence to indicate that the three drugs are not equally preferred? Let 
a - 0.05, and determine the p value. 


Subject 
Drug A 
Drug B 
Drug C 


1 2 3 

1 1 1 

2 2 2 

3 3 3 


4 5 

1 1 

3 2 

2 3 


6 

1 

3 

2 


7 

2 

1 

3 


8 

1 

3 

2 


9 10 

2 1 

1 3 

3 2 



12.10.4 The production engineer for a certain manufacturing concern carried out an ex¬ 
periment to evaluate the effect of music on production. Randomly selected assembly-line 
employees of five branch plants participated. For one month slow, quiet music was piped 
into the assembly-line area. The following month the music was louder and faster. During 
the third month, there was no music at all. The following table shows the number of units 
produced per employee per hour under the three conditions. Can it be concluded on the 
basis of these data that the different conditions have a different effect on production? Let 
a = 0.05. Find the p value. 


Plant Slow, quiet music Loud, fast music No music 


1 

750 

725 

760 

2 

1000 

850 

900 

3 

800 

825 

830 

4 

950 

875 

925 

5 

925 

900 

890 


12.11 THE SPEARMAN RANK CORRELATION COEFFICIENT 

When the assumptions underlying the parametric correlation coefficient introduced 
in Chapter 9 are not met, there are several measures of correlation that we can 
use. One of the simplest and most widely used is the Spearman rank correlation 
coefficient (1904), designated by r s . As the name implies, r s is computed from 
data consisting of ranks. Suppose that the observations in their original form are 
not ranks. Then we can use the Spearman rank correlation coefficient as a measure 
of correlation if we can rank the observations according to magnitude from small¬ 
est to largest. 

The data consist of a bivariate random sample of size n. One variable is des¬ 
ignated X, and the ranks consist of 1 (the observation on X that is smallest in 
magnitude), 2, n (the observation on X that is largest). The other 

variable is designated Y and ranked 1,2,. . . . ., n, according to the relative 

magnitude of the observations. Alternatively, we may assign the rank of 1 to the 
largest value of X (and Y), and so on to the rank of n , which we assign to the 
smallest value of X (and Y). The direction of the ranking is immaterial so long as 
we rank both X and Y in the same direction. 

If the two rankings are perfectly and directly correlated, the rank of X will 
equal the rank of Y for all pairs. If the rankings are perfectly and inversely 


correlated, the smallest X rank will be paired with the largest F rank, and so on. 
Finally, the largest X rank will be paired with the smallest Y rank. 

The Spearman rank correlation coefficient focuses on the differences between 
X and Y ranks (denoted d t ) as a measure of the extent to which the paired rankings 
depart from perfect direct or inverse correlation. Except in the case of perfect 
direct correlation, some of the d f will be negative. Because of the difficulty of 
working with negative numbers, we use the values of the dj in the computation 
of r s , which is given by 


r s 


62 d] 
n(n 2 - 1) 


( 12 . 11 . 1 ) 


The larger the differences between the ranks of X and Y , the larger will be 'Zdj. 
If all the differences are 0, Hdf will equal 0, r s will equal 1, and we consider the 
rankings perfectly and directly correlated. If we observe the maximum possible 
differences between the ranks of X and Y —that is, if the ranking of X is the reverse 
of the ranking of Y in each case— X,dj will be a maximum and r s will equal - 1. 
When the rankings are less than perfectly correlated, r s will be somewhere be¬ 
tween + 1 and — 1. Remember that r s measures the strength of the association 
between ranks, not the values of the variates that have been ranked. 

If the data do not contain ties, the result of Equation 12.1.1 is equal to the 
result obtained using Equation 9.8.2 when the original observations are replaced 
by their ranks. 

We can use the Spearman rank correlation coefficient to test any one of the 
following hypotheses: 


1. H 0 : The X, and F, are mutually independent 

H x \ Either large values of X t tend to be paired with large values of F, or large 
values of X, tend to be paired with small values of F, 

2. H 0 : The X, and F, are mutually independent 

H ] : Large values of X, tend to be paired with large values of F, 

3. H 0 : The X, and F, are mutually independent 

H x : Large values of X ( tend to be paired with small values of Y, 


The first pair of hypotheses specifies a two-sided test. The last two pairs specify 
one-sided tests. If we wish to know whether we can conclude that large values of 
X, tend to be paired with small values of Y h we test the H 0 specified in 3. If we 
wish to know whether we can conclude that large values of X, tend to be paired 
with large values of Y h we test the H 0 specified in 2. If we wish to detect a 
departure from independence in either direction, we test the H 0 specified in 1. 

The procedure for testing r s for significance depends on the sample size. If n 
is less than or equal to 30, we can consult Appendix Table O. Table O contains 
critical values of r s for various values of a. For values of n greater than 30, we 
can compute the statistic 

z = r s yjn — 1 02 . 11 . 2 ) 


and compare it for significance with values of the standard normal distribution. 




TABLE 12.11.1 
Scores of 15 em¬ 
ployees evaluated 
by peers (X) and 
supervisors (V) 

TABLE 12.11.2 
Fifteen employees 
ranked as to 
congeniality and 
cooperativeness 
by peers and 
supervisors 


Employee 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

Peers, (X) 

90 

83 

60 

95 

84 

68 

93 

55 

79 

78 

71 

80 

87 

76 

89 

Supervisors, ( Y) 

90 

89 

63 

87 

85 

57 

81 

68 

60 

65 

67 

76 

70 

55 

69 


Employee 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

X rank 

13 

9 

2 

15 

10 

3 

14 

1 

7 

6 

4 

8 

11 

5 

12 

/ rank 

15 

14 

4 

13 

12 

2 

11 

7 

3 

5 

6 

10 

9 

1 

8 

d> 

-2 

-5 

-2 

2 

-2 

1 

3 

-6 

4 

1 

-2 

-2 

2 

4 

4 

c/j 

4 

25 

4 

4 

4 

1 

9 

36 

16 

1 

4 

4 

4 

16 

16 


Ties may occur in the rankings of X, Y, or both. In such cases, the mean of 
the ranks that would have been assigned had no ties occurred is assigned to the 
tied ranks. [Other rank-correlation methods are discussed in Kendall (1955), Krus- 
kal (1958), and Hotelling and Pabst (1936), as well as in some of the other 
references already cited.] 


EXAMPLE 12.11.1 A random sample of 15 assembly-line employees of a large 
manufacturing firm are evaluated by their peers and their supervisors as to con¬ 
geniality and cooperativeness on the job. Table 12.11.1 shows the scores. The 
firm’s personnel director wishes to know whether he can conclude that the two 
measures are directly correlated. Table 12.11.2 shows the resulting ranks. 

The appropriate hypotheses are 

H 0 : The X t and Y ( are mutually independent 

Hi'. Large values of X, tend to be paired with large values of Y { 


By Equation 12.11.1, we compute 

. = , _ 6(148) 

5 15(15 2 - 1) 


0.7357 


Table O indicates that the probability of obtaining a value of r s as large as or 
larger than 0.7357 when the null hypothesis is true is less than 0.005. Therefore 
we would reject the null hypothesis at the 0.05, 0.01, or 0.005 level of signifi¬ 
cance. Accordingly we must reject the null hypothesis and conclude that the two 
rankings are directly correlated (0.005 > p > 0.001). 


Exercises £3 12.11.1 The personnel department of a large firm gives a test to 20 employees to measure 

H| their degree of job satisfaction. The following table gives the results, along with the 
employees’ average daily production (in units) during the past year. Convert the original 
observations to ranks and test to see whether we can conclude that the two rankings are 
directly correlated. Let a = 0.05. Find the p value. 



Employee 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Job satis, score 

97 

83 

73 

88 

69 

70 

76 

60 

73 

99 

Ave. daily production 

166 

174 

111 

189 

106 

129 

159 

136 

153 

160 

Employee 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

Job satis, score 

97 

62 

87 

89 

98 

93 

85 

79 

64 

85 

Ave. daily production 

165 

121 

166 

169 

189 

161 

195 

145 

138 

174 


12.11.2 A pane] of 5 men and another panel of 5 women are asked to rank 10 ideas for 
a new television program on the basis of their relative appeal to a general audience. The 
results are shown in the following table. Test the null hypothesis that the rankings are 
mutually independent against the alternative that they arc inversely con-elated. Let a = 
0.05. Determine the p value. 


Program idea 1 23456789 10 

Men 6 4 8 7 2 1 3 5 9 10 

Women 6 10 8271594 3 

12.11.3 A panel of small-business experts ranks 15 small businesses on the basis of their 
employees’ job-satisfaction scores and on the basis of growth potential of the businesses. 
The results are given in the following table. Can we conclude from these data that there 
is a direct relationship between employee satisfaction within a firm and that firm’s growth 
potential? Let a = 0.01. Find the p value. 


Employee satisfaction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Potential for growth 6 3 1 2 4 5 11 10 7 8 12 9 14 13 15 

12.11.4 The following table shows, for a random sample of college sophomores, grades 
made in a statistics course and the scores the students assigned the course on a course- 
evaluation form. Can we conclude from these data that the two variables are directly 
correlated? Let a — 0.05. Find the p value. 


Statistics grade 70 88 85 84 90 95 

Course evaluation score 4 8 6 5 9 2 

12.11.5 A random sample of 10 employees take part in a study to assess the relationship 
between the employee’s scores on a job-aptitude test and supervisors’ evaluations of their 
job performance. Compute r s and test to determine whether the two variables are directly 
related. Let a = 0.05. Find the p value. The data are as follows. 


Aptitude score 13 41 72 24 57 100 84 36 92 63 

Supervisors' evaluation 31 19 81 43 50 74 63 24 100 96 


12.12 NONPARAMETRIC LINEAR REGRESSION 

Regression analysis is one of the most widely used statistical techniques. In Chap¬ 
ters 9 and 10, we discussed simple linear regression and multiple-regression anal¬ 
ysis. In Chapter 9 we learned to use the method of least squares to compute, from 
sample data, a and b, the y intercept and slope, respectively, of a sample regression 
line. One can use this line to estimate the true regression line that describes the 


linear relationship between two variables X and Y for the population from which 
the sample was drawn. As we have seen, we can use the resulting equation, 

y = a + bx 

for prediction and estimation. We can do so only if certain assumptions about the 
data (given in Chapter 9) are met and if we can conclude from the sample data 
that X and Y are indeed related. To determine whether we may conclude that X 
and Y are related, we test H 0 : /3 = 0 against //,: p 7 ^ 0. If we reject // 0 , we 
conclude that, in the sampled population, X and Y are related, since if they are 
not related, p = 0. In other situations we may wish to test the null hypothesis 
that P is of some magnitude other than zero. To repeat: The test is strictly valid 
only if the assumptions in Chapter 9 are met. If we suspect that the data do not 
conform to the necessary assumptions for the application of the techniques de¬ 
scribed in Chapter 9, we must use an alternative procedure. 

Conover (1980) describes a nonparametric test for p that is an appropriate 
alternative when the regression of Y on X is linear and the residuals are independent 
of the X’s. The procedure is valid when X is random. It is also valid when X is 
nonrandom, provided that the F’s are independent and their populations are iden¬ 
tically distributed. We find point estimates of the sample slope and y intercept by 
applying Formulas 9.4.5 and 9.4.6, respectively. Usually the statistics a and b 
are part of the output of computer programs for simple linear regression analysis. 

We may use the technique described by Conover to test any of the following 
null hypotheses against their accompanying alternatives. 

1. H 0 : p = p 0 , H v : p # p 0 

2- H 0 : p = ft, ff,:0>ft 

3. H 0 : p = p 0 , H x \ p < p o 

The procedure is as follows. 

1. For each pair of measurements (x lf y,), compute y t - p 0 x t = u t . 

2. Compute the Spearman rank correlation coefficient r s for the pairs (x t , u { ) as 
described in Section 12.11. 

3. The decision rules are as follows. For //,: p > p 0 , reject H 0 at the a level of 

significance if the computed r s is greater than rf for given n and a as shown in 
Table 0. For//,: P < p 0 , reject H 0 at the a level of significance if the computed 

r s is less than — r% for given n and a. For H x \ p p o , reject H 0 at the a level 

of significance if the computed r s is either greater than rf for n and a/2 or less 
than — r% for n and a/2. 

The following example shows the use of the technique. 

EXAMPLE 12.12.1 Researchers for a fertilizer manufacturer conducted an experi¬ 
ment in which specified amounts of nitrogen (X) were added to pots containing 
house-plant seedlings. Later, they determined the amounts of nitrogen in the ma- 





TABLE 12.12.1 
Data for Example 
12 . 12.1 


TABLE 12.12.2 
Ranks of x, and u, 
and intermediate 
calculations for 
Example 12.12.1 


Plant 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Y 

0.33 

0.62 

0.35 

0.75 

0.66 

0.22 

0.83 

0.73 

0.42 

0.79 

0.94 

0.36 

X 

0.20 

0.60 

0.40 

0.90 

0.70 

0.10 

1.10 

0.80 

0.30 

1.00 

1.20 

0.50 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Xj 

0.20 

0.60 

0.40 

0.90 

0.70 

0.10 

1.10 

0.80 

0.30 

1.00 

1.20 

0.50 

Uj 

0.23 

0.32 

0.15 

0.30 

0.31 

0.17 

0.28 

0.33 

0.27 

0.29 

0.34 

0.11 

/?(*,) 

2 

6 

4 

9 

7 

1 

11 

8 

3 

10 

12 

5 

R ( t /,) 

4 

10 

2 

8 

9 

3 

6 

11 

5 

7 

12 

1 

di 

-2 

-4 

2 

1 

-2 

-2 

5 

-3 

-2 

3 

0 

4 

df 

4 

16 

4 

1 

4 

4 

25 

9 

4 

9 

0 

16 


ture plants (Y). Table 12.12.1 shows the results. We wish to know whether we 
can conclude, at the a = 0.05 level, that the slope of the population regression 
line describing the relationship between X and Y is greater than 0.5. We proceed 
as follows. 

The hypotheses are: 

H 0 : /3 = 0.5, H x : (3 > 0.5 

We will reject H 0 if the computed r s is greater than 0.4965, the value of rj in 
Table O corresponding to n = 12 and a = 0.05. The u t are as follows. 

u x = 0.33 - (0.5)(0.20) = 0.23 u 7 = 0.83 - (0.5)(1.10) = 0.28 

u 2 = 0.62 - (0.5)(0.60) = 0.32 w 8 = 0.73 - (0.5)(0.80) = 0.33 

« 3 = 0.35 - (0.5)(0.40) = 0.15 u 9 = 0.42 - (0.5)(0.30) - 0.27 

u 4 = 0.75 - (0.5X0.90) = 0.30 u 10 = 0.79 - (0.5)(1.00) = 0.29 

u s = 0.66 - (0.5)(0.70) = 0.31 u xx = 0.94 - (0.5)(1.20) - 0.34 

u 6 = 0.22 - (0.5X0.10) - 0.17 u X2 = 0.36 - (0.5)(0.50) = 0.11 

Table 12.12.2 shows the array of the ,r’s and w's, their ranks, the d h and the 
dj. The computed value of the test statistic, from Equation 12.11.1, is 


6(96) 
12(144 - 


= 0.6643 


Since 0.6643 is greater than 0.4965, we reject H 0 . We conclude that the slope 
of the population regression line is greater than 0.5. Since 0.6713 > 0.6643 > 
0.5804, the p value for this test is 0.01 < p < 0.025. 


Exercises 


12.12.1 The following are scores made by a random sample of college students who had 
just completed their sophomore year. Can we conclude on the basis of these data that 
0? Let a = 0.05. Find the p value. 


Verbal fluency 

224 

280 

522 

370 

391 

605 

420 

291 

Academic achievement 

5.87 

3.40 

8.90 

4.24 

6.20 

8.73 

17.30 

5.00 

Verbal fluency 

168 

211 

531 

439 

303 

516 

233 

429 

Academic achievement 

4.31 

7.21 

19.41 

11.60 

12.30 

16.61 

3.51 

13.99 


12.12.2 Researchers studied the relationship between visual discrimination and reading 
ability. They used a sample of 25 employees randomly selected from a large company. A 
visual discrimination test and a reading test given to the subjects yielded the following 
results. Can we conclude on the basis of these data that j3 > 0.70? Let a = 0.05 and find 
the p value. Here X denotes the visual discrimination score and Y denotes the reading 
ability score. 


X 

91 

91 

94 

92 

87 

91 

93 

89 

94 

87 

90 

93 

94 

Y 

525 

600 

600 

575 

480 

575 

595 

530 

545 

535 

575 

540 

625 

X 

90 

87 

92 

88 

89 

89 

88 

89 

92 

90 

89 

94 


Y 

545 

440 

490 

460 

545 

600 

525 

510 

510 

495 

480 

525 



This chapter introduced a variety of statistical techniques that you may use under 
the following conditions: 

1. When the assumptions underlying the parametric procedures, presented in pre¬ 
vious chapters, are not met 

2. When the data represent measurements on a weak measurement scale 

3. When you need results in a hurry 

4. When the hypothesis does not involve a parameter 

You learned to test appropriate hypotheses using the following nonparametric 
procedures: 

1. The one-sample runs test 

2. The Wilcoxon test 

3. The Mann-Whitney test 

4. The sign test 

5. The Kruskal-Wallis one-way analysis of variance 

6. The Friedman two-way analysis of variance 

7. Spearman rank correlation 

8. Nonparametric regression analysis 

These procedures are characterized by either the fact that they do not depend 
on the form of the distribution from which the samples are drawn, or the fact that 
the hypotheses tested are not statements about population parameters. Procedures 
of the former type are called distribution-free procedures. Those of the latter type 
are called nonparametric procedures. For convenience, as well as by convention, 
we refer to both types as nonparametric procedures. 

Except for the runs test, the procedures presented in this chapter are nonpara¬ 
metric analogues of parametric procedures presented in previous chapters. 

[In addition to the references that were cited, the articles by Blum and Fattu 


(1954), Lord (1953), McClure (1971), and Morris (1969) are of interest. The 
article by Blum and Fattu provides a general overview of nonparametric statistics 
and lists over 100 references. Lord’s article is a humorous treatment of the “na¬ 
ture -of-the-data” problem. McClure gives an example of the use of nonparametric 
statistics in marketing. Morris discusses the use of the computer with nonpara¬ 
metric procedures. An extensive bibliography of nonparametric statistics is given 
by Savage (1962). Daniel (1979) has prepared a bibliography of nonparametric 
regression techniques.] 


Review Questions 




1. Define: (a) parametric statistics, (b) nonparametric statistics, (c) distribution-free sta¬ 
tistics. 

2. Under what conditions are nonparametric procedures used? 

3. Define: (a) measurement, (b) nominal scale, (c) ordinal scale, (d) interval scale, (e) 
ratio scale. 

4. List the advantages and disadvantages of nonparametric statistics. 

5. Describe a situation from your area of interest in which each of the following non¬ 
parametric procedures could be used: (a) runs test, (b) Mann-Whitney test, (c) sign test, 
(d) Kruskal-Wallis test, (e) Friedman test, (f) Spearman rank correlation, (g) Wilcoxon 
test, (h) nonparametric regression analysis. Use real or realistic data and carry out an 
appropriate hypothesis-testing procedure for each test. 

6. A maker of sporting goods is testing a new material that can be used in the production 
of tennis balls. An attractive feature of the new material is the fact that it is less expensive 
than the material currently in use. To evaluate the new material, the company gives 15 
expert tennis players a supply of balls made from both the new and the old materials. The 
players use each type of ball during 10 hours of practice. They then say which of the two 
they prefer. Twelve of the 15 players say they prefer the balls made from the new material. 
Would you recommend that the company switch to the new material? Support your answer 
with an appropriate statistical analysis. Let a = 0.05. Find the p value. 

7. The desired mean length of a certain bolt produced in a factory is 80 mm. Slight 
random deviations from the desired mean are tolerable. Twenty consecutive bolts are 
measured, with the following results. Do these data provide sufficient evidence to indicate 
that deviations above and below the mean do not occur at random? Let a = 0.05, and 
find the p value. 


80.3 80.5 80.4 79.5 79.3 80.2 79.7 80.7 79.8 80.8 

79.9 80.1 79.2 80.6 80.1 79.9 79.9 79.9 79.6 79.1 




8. Each of 16 randomly selected homemakers in a small town is given a complimentary 
case of soft drinks for participating in a taste test. They have a choice of either “no¬ 
deposit, no-return” or returnable bottles. Thirteen choose the returnable bottles. Do these 
data provide sufficient evidence to indicate that homemakers in the town prefer soft drinks 
in returnable bottles? Let a = 0.05, and find the p value. 

9. A market research team asks each member of a panel of consumers to guess the retail 
price of 16 small household items on the basis of a simple inspection of the items. The 
following table gives the actual retail price and the average prices guessed by members of 
the panel. Compute the rank-correlation coefficient between the average guessed price and 
the actual retail price. Test for significance. Let a = 0.01, and determine the p value. 


f 




Item 

Average 

guessed 

price 


Actual I 

retail 
price 

. - ... [ 

Item 

Average 

guessed 

price 

Actual 

retail 

price 

1 

1.29 


1.45 

9 

2.05 

1.99 

2 

1.10 


1.19 

10 

3.25 

3.79 

3 

2.40 


2.29 

11 

4.75 

4.25 

4 

2.25 


1.65 

12 

2.25 

2.89 

5 

1.95 


2.49 

13 

4.05 

3.90 

6 

4.00 


4.79 

14 

2.15 

3.00 

7 

2.98 


3.75 

15 

2.60 

3.78 

8 

1.65 


1.59 

16 

4.10 

4.70 

10. Ten applicants for credit cards are ranked on their ci 

■edit-risk potential by two officials 

(denoted X and Y) of the 

issuing bank. 

The results are 

given in 

the following table. Do 

these data 

suggest a lack of 

independence between the 

two officials’ assessments of ap- 

plicants’ credit-risk potentials? Let a = 

0.05, and find the p value. 

Applicant 

A 

B 

C 

D E 

F 

G H 1 J 

X 

3 

4 

9 

10 6 

1 

7 2 5 8 

Y 

5 

3 

7 

9 6 

2 

8 4 1 10 


11. The following table shows data (in rank form) collected on 15 line managers with a 
large industrial firm. The X variable is the number of years of management experience. 
The Y variable is the quality of the managers’ decision-making abilities as assessed by 
their supervisors. Do these data provide sufficient evidence to indicate a lack of inde¬ 
pendence between the two variables? Let a = 0.01, and determine the p value. 



Rank of X 1 2 3 4.5 4.5 6 7 8 9.5 9.5 11 12 13 14 15 

Rank of/ 321 7 8 5 4 6 14 15 12 91011 13 

12. A research team hired by a drug manufacturer wants to compare the effects of 5 

different drugs on the reaction times of experimental animals. They randomly assign 20 
animals to receive one of the 5 drugs. The following table shows the animals’ decreases 
in reaction times (in milliseconds) to a standard stimulus after the drugs have been ad¬ 
ministered. Do these data provide sufficient evidence to indicate a difference among the 
drugs? Let a - 0.05, and determine the p value. 



Drug A 

4.8 

4.6 

4.7 

4.6 

Drug B 

4.9 

5.2 

4.5 

4.1 

Drug C 

5.3 

5.2 

5.2 

5.2 

Drug D 

5.2 

5.1 

5.0 

5.0 

Drug E 

4.5 

4.2 

4.7 

4.7 


13. A fertilizer manufacturer hires a researcher to conduct an experiment comparing the 
effects of three different formulas on the yield of tomato plants. The researcher applies 
each formula at random to each of five plots of ground of uniform size in which tomatoes 
have been planted. Yields, in pounds per plot, are as shown in the following table. Do 
these data provide sufficient evidence to indicate a difference in the three formulas? Let 
a = 0.05, and determine the p value. 


Formula A 

78.0 

77.5 

50.0 

76.0 

69.1 

Formula B 

51.0 

61.0 

49.5 

55.4 

51.5 

Formula C 

56.0 

43.1 

59.5 

56.4 

52.5 


14. An official with a fast-food establishment conducts an experiment to compare 4 meth¬ 
ods of storing hamburger meat. The variable of interest is a measure of the bacteria growth 




after 48 hours of storage. Source of supply of raw meat is used as a blocking factor. Four 
batches of meat from each source are randomly assigned, one to each storage method 
(methods A, B, C, and D). The results are as shown in the following table. After we 
eliminate the variability due to the source, do these data provide sufficient evidence to 
indicate a difference in storage methods? Let a — 0.01, and determine the p value. 


Source of raw meat 

V 

w 

X 

Y 

Z 

Method A 

20 

70 

80 

50 

30 

Method B 

10 

60 

40 

30 

10 

Method C 

40 

110 

100 

50 

50 

Method D 

10 

60 

40 

30 

10 


15. Each of a random sample of 10 students ranked 5 accounting professors on the basis 
of teaching ability. The following table shows the results. Do these data provide sufficient 
evidence to indicate that some professors are preferred over others? Let a = 0.05, and 
determine the p value. 


Student 
Prof. A 
Prof. B 
Prof. C 
Prof. D 
Prof. E 


1 2 

1 2 

3 3 

2 1 

4 5 

5 4 


3 4 

1 1 

4 2 

2 3 

3 5 

5 4 


5 6 

2 2 

1 3 

3 1 

4 5 

5 4 


7 8 

1 2 

2 1 

4 3 

3 4 

5 5 


9 10 

1 2 

2 1 

4 3 

3 4 

5 5 


16. A team of research psychologists believes that male college students are more assertive 
than female college students. A random sample of - 16 college males made the 
following scores on a test designed to measure assertiveness (X): 22.3, 24.1, 28.9, 32.6, 
29.3, 15.0, 39.9, 36.8, 21.3, 32.3, 27.0, 33.0, 25.5, 22.5, 33.3, 33.7. The scores (Y) 
made by a random sample of n 2 = 19 college females were: 18.1, 22.9, 10.0, 10.5, 19.1, 
10.0, 10.0, 26.9, 12.5, 10.9, 19.1, 11.7, 14.6, 19.2, 11.4, 29.6, 33.4, 27.0, 25.3. Use 
the Mann-Whitney test to determine whether or not the psychologists are justified in their 
belief. Let a = 0.05. Compute thep value. Higher scores indicate greater assertiveness. 

17. Twelve randomly selected employees scheduled for transfer to a new and unfamiliar 
job took a test designed to measure their level of anxiety. The results by ethnic group were 
as follows. We wish to know whether these data provide sufficient evidence to indicate 
that population median anxiety scores differ among the three ethnic groups. Let a = 0.05. 
Determine the p value. 


Group A 

52 

61 

58 

64 


Group B 

74 

83 

88 



Group C 

90 

88 

89 

65 

77 


18. The following table categorizes 15 randomly selected white-collar employees accord¬ 
ing to the area in which they were reared and their ranks on a test of verbal reasoning 
ability. Do these data provide sufficient evidence to indicate that the three populations 
differ with regard to median verbal reasoning ability? Let a = 0.05. Determine the p 
value. 


Suburban 13 15 14 11 12 

Rural 1 6 10 2 3 

Inner city 8 5 9 7 4 

19. Seven randomly selected college seniors majoring in statistics were asked to rank three 
models of pocket calculator in order of preference. The results are shown in the following 



table. The investigator wished to know whether the models differ in their appeal to students. 
Let a - 0.05. Determine the p value. 


Student 1 

Model A 2 

Model B 1 

Model C 3 


2 3 4 

1 3 3 

2 1 1 

3 2 2 


5 6 7 
2 1 2 
1 2 1 
3 3 3 


20. Select a simple random sample of size 20 from the population of employed heads of 
household given in Appendix II. Use the rank-correlation technique to see whether you 
can conclude that there is a direct relationship between age and annual salary. Let a = 
0.05. Find the p value. 

21. The following data are measures of cardiac well-being of eight hospitalized patients 
before and after treatment with an experimental drug. A larger value indicates a more 
desirable condition. 


Patient 1 2 3 4 5 6 7 8 

After 48 67 104 60 52 63 46 52 

Before 50 41 51 36 39 57 44 35 

Use the Wilcoxon test to determine whether we can conclude that the treatment is effective. 
Let a = 0.05. Find the p value. 

22. A random sample of 12 executives aged 50 or older participated in a physical-fitness 
program for six months. The following are the subjects’ serum cholesterol levels at the 
beginning and end of the program. 


Subject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Before 

210 

231 

225 

260 

235 

240 

326 

235 

240 

267 

284 

201 

After 

195 

236 

210 

220 

220 

210 

296 

195 

201 

237 

210 

220 


Do these data provide sufficient evidence to indicate that the physical-fitness program is 
effective in lowering serum cholesterol values? Use the Wilcoxon test. Let a = 0.05. 
Find the p value. 

23. At the end of the physical-fitness program, the 12 executives in Exercise 22 took a 
test to measure their endurance, with the following results: 936, 977, 891, 883, 844, 975, 
978, 873, 945, 946, 826, 855. Can we conclude on the basis of these data that the 
population mean is not 900? Let a = 0.05. Find the p value. Use the Wilcoxon test. 



Magazine Ads and the Fog Index 


Robert Gunning, in his book, The Technique of Clear Writing (McGraw-Hill, 
1968) uses what he calls the Fog Index to measure the readability of printed 
materials. Anyone can measure the Fog Index for a given book or article, just 
by using the following three-step procedure. 

1. Find the average sentence length. You do this by counting off a number of 
sentences and then dividing the total number of words in them by the number 





of sentences. For books and long articles, several 100-word passages are ex¬ 
amined this way. 

2. For every 100 words of text, count the total number of words with three or 
more syllables. Do not count proper names, combinations of short easy words 
(such as fairyland), or three-syllable words that are created by adding -es, -ed, 
or -ing. 

3. Add the results of steps 1 and 2 and multiply by 0.4 to obtain the Fog Index 
value. 

Shuptrine and McVicker* hypothesized that there is a high correlation be¬ 
tween the educational level of magazine audiences and the readability of the 
ads appearing in these magazines. To see if their hypothesis would be sup¬ 
ported, they selected a sample of nine magazines. For each magazine, they 
computed a measure of the educational level of its readers. They selected at 
random six advertisements from each magazine, and computed the Fog Index 
for each of the ads. They then averaged the six scores, to obtain a summary 
Fog Index for the magazine's ads. Finally, they ranked the magazines on the 
basis of readers' educational level and the Fog Index. They obtained a Spear¬ 
man rank correlation coefficient of 0.183. Is their hypothesis supported? Let 
a = 0.05. Find the p value for the test. What assumptions are necessary? 

*F. Kelly Shuptrine and Daniel D. McVicker, "Readability Levels of Magazine Ads," Journal of Advertising 
Research, 21 (October 1981), 45-51. 


Business Ethics of Top versus Middle Managers 

We are all aware that ethics are concerned with the right and wrong aspects 
of an action. But in today's fiercely competitive business world, what do man¬ 
agers consider “right" and what “wrong"? Where do they draw the line? To 
find out, Kam-Hon Lee* did a study of ethical standards of marketing man¬ 
agers. He asked practicing managers to evaluate certain statements and prac¬ 
tices. One aspect of the topic with which he was concerned was whether top 
and middle managers both had the same beliefs about ethics. 

Participants in the study indicated the extent to which they agreed with ten 
statements having to do with ethical behavior. As an example, one of the 
statements was: “You produce an anti-dandruff shampoo that is effective with 
one application. Your assistant says that the product would turn over faster if 
the instructions on the label recommended two applications. Do you agree 
that you would recommend two applications?" From subjects' responses, Lee 
constructed the following ranking (from most agreeable, 1, to least agreeable, 
10) of the statements by top managers (TM) and middle managers (MM). 


Statement 1 23456789 10 

Rank by TM 245681 10 397 

Rank by MM 136582 10 497 


Compute a measure of the agreement among ranks. What do you conclude 
from these results? What assumptions are required? 

*Kam-Hon Lee, "Ethical Beliefs in Marketing Management: A Cross-Cultural Study," European Journal of 
Marketing, 15, 1 (1981), 58-67. 



13. Time-Series Analysis 
and Index Numbers 


Chapter Objectives: This chapter introduces you to 
some of the tools that are useful in the analysis of 
time-series data. These tools will help you identify the 
underlying trends and patterns that exist in time- 
series data and will give you a basis for forecasting 
the values for future time periods. After studying this 
chapter and working the exercises, you should be able 
to do the following. 

1. Construct a trend line for time-series data 

2. Compute a moving average 

3. Calculate a measure of seasonal variation 

4. Calculate a measure of cyclical variation 

5. Use the trend equation and/or seasonal index 
values to obtain forecasts for future time periods 

6. Construct index numbers 



13.1 INTRODUCTION 


Components of 
Time-Series Data 


A time series is a sequence of values of some variable, or composite of variables, 
taken at successive time periods. Examples of time series include the monthly 
sales volume of a department store, annual production of steel in the United States, 
annual births in a given state, and the weekly price of pork. In time-series analysis, 
interest focuses on the variability from one time period to another exhibited by 
the variable of interest. Why are there more births one year than another? Why 
is the price of pork higher this week than last? Why are sales of a department 
store not the same for December as for February? 

The analysis of time-series data is of interest to those who wish to understand 
the nature of past and present data, and to those who want to use knowledge of 
past data to forecast the future. This chapter tries to convey an idea of the nature 
of time-series data by presenting some analytical techniques and showing how we 
use time-series data for forecasting. 

Accurate forecasts are very important to both the short-range and long-range 
planning of a business firm. Good forecasting techniques are needed, for example, 
in the areas of production, capital investment, personnel management, and inven¬ 
tory control. We have a variety of techniques for obtaining forecasts. A firm may 
base its forecasts on the individual or collective opinions of its executives, the 
hunches of its sales directors, or the convictions of its chief accountants. Alter¬ 
natively, the firm may incorporate statistical procedures in its forecasting methods. 
The sections that follow discuss some of the procedures appropriate for this pur¬ 
pose. 

The classical approach to time-series analysis begins with the premise that a typical 
time series has the following four components: 

1. Secular trend is the general behavior of a given variable over a long period of 
time. By observing the secular trend, we can characterize a time series as showing 
a downward trend, an upward trend, or a steady trend. 

2. Seasonal variation refers to variation of a periodic nature. It is not limited to 
periodic variation associated with the seasons of the year, although variation of 
this type is certainly important. Some examples of variables subject to seasonal 
variation are production of certain farm products, sales volume of department 
stores, boating accidents, and the number of cars passing a certain point between 
downtown and suburbia. The unit of time referred to in discussing seasonal var¬ 
iation is less than a year. It may be a quarter, a month, a week, a day, or part of 
a day. 

3. Cyclical variation refers to those up-and-down fluctuations that are observable 
over extended periods of time. These wavelike fluctuations, called business cycles, 
are different from seasonal fluctuations in that they cover longer periods of time, 
have different causes, and are less predictable. 

4. Irregular or erratic variation is that variation not accounted for by trend, cycle, 
or seasonal factors. Sometimes called random variation, this component is not 



Editing and 
Adjusting Data 


systematic like the other components. In this discussion, random variation is 
considered to be due to a host of unpredictable influences. Some feel that this 
definition is an oversimplification. Nevertheless, we can get worthwhile results 
from analyses based on this concept. 

Given these four components of a time series, let us investigate the nature of 
the relationship among them. The relationship is usually described by one of two 
models: the multiplicative model or the additive model. We may present the ad¬ 
ditive model as: 


r=r+s+c+£ 

where Y is an observed value of the variable of interest, T is the trend component, 
S is the seasonal component, C is the cyclical component, and E is the erratic 
component. 

In the additive model, S, C, and E are quantitative deviations about T. This 
model assumes that the components are independent of one another. 

The most widely used model is the multiplicative model: 

Y = T S ' C E 

The symbols refer to the same sources of variation as in the additive model, but 
they are expressed differently and are not numerically equivalent. In the additive 
model, all the values were expressed in original units. But in the multiplicative 
model, only one component, usually trend, is expressed in original units. The 
other three components are expressed as relatives, or percentages. 

Before we actually analyze time-series data, we should edit or adjust them where 
necessary. Factors that we usually consider during the adjustment procedure are 
calendar variation, price changes, population changes, and comparability. 

The fact that the months of the year have a variable number of days introduces 
an obvious problem. The usual adjustment consists of dividing the total value for 
the month by the number of days in the month and multiplying the result by 
30.4167, the average number of days in a month. The multiplying factor for leap 
years is 30.5. 

If we are interested in quantity changes in a value series that is stated in dollars, 
we must make an adjustment for price changes. We do so by dividing each item 
in the series by an appropriate index. 

An actual increase over time in the total values of series data, such as volume 
of production, may be misleading because of population changes. In this case it 
is usually more meaningful to examine the variable of interest on a per capita 
basis rather than as observed totals. To convert total production to units per capita, 
for example, we divide production totals by population totals. 

For many reasons, data collected over a long period of time may not be com¬ 
parable. The definition of the item of interest may change over the years. Pro¬ 
cedures for reporting data may change from time to time. For example, monthly 
data may have been presented first as averages and later as totals. Or the report 
period itself may have been changed to cover a longer or shorter period. 


13.2 SECULAR TREND 


Let us begin our analysis of time-series data by considering the measurement of 
trend. The increase or decrease in a series from one time period to another may 
be fairly constant. If so, a straight line may be an appropriate means of describing 
the trend. However, data do not always obligingly conform to a linear pattern. 
Other types of curves may be better. Because of the wide applicability of the 
straight line for describing time series, this section discusses only linear trends. 
We can express linear trend as 

y t = a + bx ( 13 . 2 . 1 ) 

where y r is the value of the trend for a given time period, a is the y intercept of 
the trend line, or the trend value when x is 0, b is the slope of the trend line, or 
the change in y t per unit of time, and x is the unit of time. 

Note that Equation 13.2.1 is the same linear equation introduced in Chapter 9. 

In order to obtain an equation that we can use to describe a linear trend, we 
must have numerical values for the y intercept (a) and the slope (b) of Equation 
13.2.1. We can obtain these values by (1) & freehand fitting of a line to the data, 
(2) a method known as the method of semiaverages, and (3) the method of least 
squares. Under certain circumstances, each method may give us satisfactory re¬ 
sults. The first two methods, however, have certain drawbacks that limit their 
general application. The method of least squares, therefore, has become the method 
of choice among statisticians. It is not subject to these limitations, and it is easy 
to use. 

The study of regression analysis showed that if we can express the relationship 
between two variables Y and X by an equation of the form 

y = a + fix + e ( 13 . 2 . 2 ) 

where a and /3 are constants and e is a random variable with a mean of 0 and 
variance cr 2 , then the method of least squares yields the best estimates from sample 
data of the true slope and y intercept. Recall from Chapter 9 that, in order to 
obtain values for a and b, we must solve the following normal equations: 

= na + b 2 x ( 13 . 2 . 3 ) 

^?xy = a 2 * + b 2 x 2 03.2.4) 

We can make our later calculations easier if we use codes rather than actual 
time periods as the independent variable. Any convenient coding system is sat¬ 
isfactory. For most applications, we assign codes to the time periods in such a 
way that the sum of the codes is 0. As a result, the normal equations simplify to 

2y - 

a = — = y ( 13 . 2 . 5 ) 

n 



(13.2.6) 



TABLE 13.2.1 
Volume (billions of 
board feet) of 
timber cut from 
National Forest 
System areas, 
1956-1970 


TABLE 13.2.2 
Computations for 
Example 13.2.1 


Year 

1956 

1957 

1958 

1959 

1960 

1961 

1962 

1963 

Volume 

7.0 

7.1 

6.5 

8.5 

9.5 

8.5 

9.2 

10.2 

Year 

1964 

1965 

1966 

1967 

1968 

1969 

1970 


Volume 

11.1 

11.4 

12.3 

11.0 

12.3 

12.0 

11.7 



Source: Historical Statistics of the United States, Colonial Times to 1970, Bicentennial Edition, Part 1, Washington, D.C., 
1975. (Original data have been rounded.) 


The method of assigning codes differs slightly depending on whether the number 
of time periods involved is odd or even. The following examples illustrate the 
complete procedure for both cases. 


EXAMPLE 13.2.1 Table 13.2.1 shows the annual volume of timber cut from national 
forest areas for commercial sales for the years 1956-1970. The data are rounded 
to the nearest tenth-billion board foot. Fit a least-squares trend line to the data. 

As a first step, we replace the years with codes. Since there is an odd number 
of years, 15, we assign the code 0 to the middle year, 1963. We assign the numbers 
1 through 7 to the years 1964 through 1970, respectively. And we assign the 
numbers -1 through -7 to the years 1962 down through 1956, respectively. 
Table 13.2.2 shows the codes, along with the other necessary computations. 

The data are plotted in Figure 13.2.1. Plotting the data serves a useful purpose 
in trend analysis. The resulting picture gives us a feel for the degree to which a 
straight line fits the data. [Bry and Boschan (1971) offer suggestions on the in¬ 
terpretation and analysis of time-series scatter diagrams.] 

Substituting appropriate values from Table 13.2.2 into Equations 13.2.5 and 
13.2.6 gives 


a 


148.3 

15 


9.8867, 


117.4 

280 


0.4193 


Year 

Year code, (x) 

Volume, (y) 

xy 

X 2 

Yt 

1956 

-7 

7.0 

- 49.0 

49 

7.0 

1957 

-6 

7.1 

-42.6 

36 

7.4 

1958 

-5 

6.5 

-32.5 

25 

7.8 

1959 

-4 

8.5 

-34.0 

16 

8.2 

1960 

-3 

9.5 

-28.5 

9 

8.6 

1961 

-2 

8.5 

-17.0 

4 

9.0 

1962 

-1 

9.2 

-9.2 

1 

9.5 

1963 

0 

10.2 

0 

0 

9.9 

1964 

1 

11.1 

11.1 

1 

10.3 

1965 

2 

11.4 

22.8 

4 

10.7 

1966 

3 

12.3 

36.9 

9 

11.1 

1967 

4 

11.0 

44.0 

16 

11.6 

1968 

5 

12.3 

61.5 

25 

12.0 

1969 

6 

12.0 

72.0 

36 

12.4 

1970 

7 

11.7 

81.9 

49 

12.8 

Total 


148.3 

117.4 

280 



FIGURE 13.2.1 
Volume (billions of 
board feet) of 
timber cut from 
National Forest 
System areas, 
1956-1970 
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We can now write the equation of the trend line as 

v, = 9.8867 + 0.4193.x 

Note that a = 9.8867 is the trend value for 1963, the year for which * = 0. This 
should always be made clear in the analysis. It may be shown as follows: 

y t = 9.8867 + 0.4193.x 

(origin, 1963; time unit, 1 year; y, annual volume of timber cut 1956-1970 in 
billions of board feet). 

The next step in the analysis is to substitute the various observed values of X 
into the equation in order to obtain the values of y t shown in the last column of 
Table 13.2.2. Figure 13.2.2 shows the plotted calculated trend line, along with 
the original data. 

The following example illustrates the fitting of a least-squares trend line when 
the number of time periods is even. 

EXAMPLE 13.2.2 Table 13.2.3 shows the number of job openings that occurred 
each year from 1972-1983 at a large industrial firm. The data are plotted in Figure 
13.2.3. From the plot of the original data, it appears that there is a good possibility 
of getting a good straight-line fit. Table 13.2.4 shows the year codes and com¬ 
putations needed for obtaining a least-squares equation. 




FIGURE 13.2.2 
Volume (billions of 
board feet) of 
timber cut from 
National Forest 
System areas, 
1956-1970, original 
data and trend line 


TABLE 13.2.3 
Annual number of 
job openings 
at a large industrial 
firm, 1972-1983 



Here’s how we assign year codes in this example. When the number of time 
periods involved is even, there is no middle time period to which we can assign 
the value 0. In order to have the sum of the codes equal 0, we assign the value 
0 to a midpoint in time between the two middle time periods. We give the observed 
time period preceding the midpoint a code of — 1. We give the time period prior 
to that a code of -3. And so on, to the first time period in the series. We give 
the observed time period following the midpoint a code of + 1. We give the next 
time period a code of +3. And so on, to the last time period in the series. 

Proceeding with the analysis, we find the numerical values of a and b by 
substituting data from Table 13.2.4 into Equations 13.2.5 and 13.2.6. These 
values are 


a 


6287 

12 


523.92, 


b = 


3621 

572 


6.33 


Year 

Number of openings 

Year 

Number of openings 

1972 

473 

1978 

531 

1973 

465 

1979 

546 

1974 

472 

1980 

553 

1975 

477 

1981 

568 

1976 

505 

1982 

586 

1977 

516 

1983 

595 




TABLE 13.2.4 
Computations for 
Example 13.2.2 


FIGURE 13.2.3 
Number of job 
openings occurring 
at a large industrial 
firm, 1972-1983 


Year 

Year code, (x) 

Number of openings, {y) 

xy 

X2 

Vt 

1972 

-11 

473 

- 5,203 

121 

454 

1973 

-9 

465 

-4,185 

81 

467 

1974 

-7 

472 

-3,304 

49 

480 

1975 

- 5 

477 

-2,385 

25 

492 

1976 

-3 

505 

- 1,515 

9 

505 

1977 

-1 

516 

-516 

1 

518 

1978 

+ 1 

531 

531 

1 

530 

1979 

+ 3 

546 

1,638 

9 

543 

1980 

+ 5 

553 

2,765 

25 

556 

1981 

+ 7 

568 

3,976 

49 

568 

1982 

4-9 

586 

5,274 

81 

581 

1983 

+ 11 

595 

6,545 

121 

594 

Total 


6,287 

3,621 

572 



We can now write the trend equation as 

y r = 523.92 4- 6.33* 

(origin, 1977-1978; time unit, 6 months; y, number of job openings). Substituting 
the various values of X into this equation gives the trend values shown in the last 
column of Table 13.2.4. The computed trend line is plotted in Figure 13.2.4. 




FIGURE 13.2.4 
Number of job 
openings occurring 
at a large industrial 
firm, 1972-1983 


Shifting the Origin 



As previously indicated, a straight line is not always the line of best fit for a 
set of time-series data. Other types of curves might be useful, depending on the 
data. Examples include the parabola, exponential curves, the Gompertz curve, 
and the Pearl-Reed, or logistic, curve. The last two are widely used for describing 
growth. 

Trend analysis is of primary importance in business forecasting, an activity that 
is so vital to any business. One use of forecasting is to estimate future demand 
(or sales) of a product. As we have seen, trend analysis consists of applying 
certain statistical procedures to historical data. A firm may use the results of this 
analysis to make estimates or projections into the future. The value of such esti¬ 
mates depends on how well past experience represents future experience, after 
proper adjustment for trend, seasonal, cyclical, and erratic influences. It is mean¬ 
ingless—even dangerous—for a firm to make such projections unless it seems 
that the future will be fairly much like the past. 

Note that the data for Example 13.2.2 are still centered at the middle of each 
year, even though the time unit is six months. 

At times, we may want to change the origin or base time period (.x = 0) to another 
time period, using the same unit of time. We do this by adding or subtracting the 





Exercises 


slope of the trend line b to the y intercept of the trend line a for each unit that 
we want to change the origin. For example, y t - (a + 3b) + bx advances the 
origin three time units, whereas y t = (a - 5b) + bx moves the origin back five 
time units. The following example shows this procedure. 

EXAMPLE 13.2.3 Refer to Example 13.2.1 and move the origin six units forward, 
from 1963 to 1969. Do this as follows: 

y t = (a + 6b) + bx = 9.8867 + 6(0.4193) + 0.4193* 

- 12.4025 + 0.4193* 

(Origin, 1969; time unit, 1 year; annual volume of timber cut, billions of board 
feet.) 

When we use the original secular-trend equation, the code 4 represents the year 
1967. When we use the new equation, the code —2 represents the year 1967. 
Substituting these codes in the appropriate secular-trend equation yields 

y, = 9.8867 + 0.4193(4) = 9.8867 + 1.6772 = 11.5639 
and 

y t = 12.4025 + 0.4193(-2) = 12.4025 - 0.8386 = 11.5639 
As we see, both equations give the same results. 





13.2.1 The following table shows, for a manufacturing firm, the number of items damaged 
in shipment, 1970-1984. (a) Plot the data as presented, (b) Compute the least-squares 
trend equation for the data, and plot the trend line, (c) Compute y t for each year. 


Year 

No. of items 

Year 

No. of items 

1970 

533 

1978 

291 

1971 

373 

1979 

228 

1972 

132 

1980 

204 

1973 

555 

1981 

349 

1974 

168 

1982 

234 

1975 

281 

1983 

209 

1976 

72 

1984 

176 

1977 

175 



13.2.2 The following table shows the volume of sales, in thousands of dollars, of a retail 

store, 1971- 

-1984. Carry out steps (a) through (c) as in Exercise 13.2.1. 


Year 

Sales (x $1000) 

Year 

Sales (x $1000) 

1971 

815 

1978 

12,529 

1972 

1,276 

1979 

12,824 

1973 

4,752 

1780 

13,777 

1974 

7,535 

1981 

15,379 

1975 

10,122 

1982 

18,705 

1976 

9,642 

1983 

17,632 

1977 

14,100 

1984 

16,571 


13.2.3 The following table shows the number of items repaired under warranty by a certain 
company, 1971-1984. Carry out steps (a) through (c) as in Exercise 13.2.1. 


V 


S:. 


Year 

Number 

Year 

Number 

1971 

749 

1978 

611 

1972 

709 

1979 

600 

1973 

700 

1980 

574 

1974 

678 

1981 

559 

1975 

611 

1982 

543 

1976 

641 

1983 

534 

1977 

631 

1984 

524 

13.2.4 The following table shows the annual sales of a certain company over 11 years. 

Carry out steps (a) through (c) as in Exercise 13.2.1. 



Year 

Sales (millions of dollars) 

Year 

Sales (millions of dollars) 

1972 

12 

1978 

20 

1973 

14 

1979 

22 

1974 

18 

1980 

27 

1975 

20 

1981 

24 

1976 

18 

1982 

30 

1977 

16 




13.3 THE MOVING AVERAGE 

An alternative method of measuring trend is the method of moving averages. This 
method is nonlinear in the sense that it does not result in a straight line. However, 
it does temper, or smooth out, peaks and valleys in a set of observations. A 
moving average is defined as follows: 

A moving average is an artificially constructed time series in which the value 
for a given time period is replaced by the mean of that value and the values 
for some number of preceding and succeeding time periods. 

For example, suppose that we have a time series consisting of annual data for the 
years 1910 to 1980. We wish to compute the three-year moving average. We take 
the value for 1911, add to it the values for 1910 and 1912, and divide by 3. We 
then use the resulting average in place of the original value for the year 1911. 
We continue this procedure through 1979. We cannot compute a three-year av¬ 
erage for the year 1980 (since there is no succeeding value to add to the values 
for 1979 and 1980), nor for the year 1910 (since there is no preceding year to 
add to the years 1910 and 1911). We can compute a moving average for any 
number of years. In general, the greater the number of time periods covered, the 
smoother the resulting curve. 

The objective in constructing a moving average is to bring out the trend by 
eliminating any obscuring seasonal, cyclical, or random fluctuations. For this 
reason, the choice of a period for a moving average is important. If there is a 
cycle of uniform length and height, we can eliminate it by choosing a period for 
the moving average that is either equal to or a multiple of the cycle. 



TABLE 13.3.1 
Number of units of 
real estate sold by 
a certain broker, 
1952-1982 


Exercises 


TABLE 13.3.2 
Five-year moving 
totals and five-year 
moving average. 
Example 13.3.1 


JL 


Year 

1952 

1953 

1954 

1955 

1956 

1957 

1958 

1959 

1960 

1961 

1962 

No. units 

1 

14 

14 

24 

25 

19 

8 

17 

15 

44 

40 

Year 

1963 

1964 

1965 

1966 

1967 

1968 

1969 

1970 

1971 

1972 


No. units 

25 

9 

35 

35 

55 

22 

48 

76 

75 

199 


Year 

1973 

1974 

1975 

1976 

1977 

1978 

1979 

1980 

1981 

1982 


No. units 

83 

96 

96 

142 

46 

86 

16 

156 

. t_ 

143 

32 



Although the moving-average technique is useful under certain conditions, it 
has drawbacks. One of these is that we lose values for some years at the beginning 
and end of the series. In the example just cited, we lost one year at the beginning 
and one year at the end of the series. If we compute a five-year moving average, 
we lose two years at each end, and so on. Another disadvantage is the fact that 
the method of moving averages does not yield an equation for use in forecasting. 
A moving average is tedious to compute. The reason we present the method here 
is that we use its computational techniques in Section 13.4. As we will show, the 
computation of a 12-month moving average is an integral step in one method of 
eliminating seasonal variation. 

The following example shows the techniques of computing a moving average. 

EXAMPLE 13.3.1 Table 13.3.1 shows the number of units of real estate sold by a 
certain broker during the years 1952-1982. The objective is to compute the five- 
year moving average for these data. Table 13.3.2 shows the original data, along 
with the five-year moving totals and five-year moving-average values. Figure 

13.3.1 plots the original data and the moving average. 

13.3.1 Compute a five-year moving average, using the data of Exercise 13.2.1. 

13.3.2 Compute a three-year moving average, using the data of Exercise 13.2.2. 

13.3.3 Compute a three-year moving average, using the data of Exercise 13.2.3. 

13.3.4 Compute a three-year moving average, using the data of Exercise 13.2.4. 


Year 

1952 

1953 

1954 

1955 

1956 

1957 

1958 

1959 

1960 

1961 

Annual sales 

1 

14 

14 

24 

25 

19 

8 

17 

15 

44 

5-year moving totals 



78 

96 

90 

93 

84 

103 

124 

141 

5-year moving average 



15.6 

19.2 

18.0 

18.6 

16.8 

20.6 

24.8 

28.2 

Year 

1962 

1963 

1964 

1965 

1966 

1967 

1968 

1969 

1970 

1971 

Annual sales 

40 

25 

9 

35 

35 

55 

22 

48 

76 

75 

5-year moving totals 

133 

153 

144 

159 

156 

195 

236 

276 

420 

481 

5-year moving average 

26.6 

30.6 

28.8 

31.8 

31.2 

39.0 

47.2 

55.2 

84.0 

96.2 

Year 

1972 

1973 

1974 

1975 

1976 

1977 

1978 

1979 

1980 

1981 

Annual sales 

199 

83 

96 

96 

142 

46 

86 

16 

156 

143 

5-year moving totals 

529 

549 

616 

463 

466 

386 

446 

447 

433 


5-year moving average 

105.8 

109.8 

123.2 

92.6 

93.2 

77.2 

89.2 

89.4 

86.6 





FIGURE 13.3.1 
Number of units of 
real estate sold by 
a broker, 1952- 
1982, original data 
and five-year 
moving average 



Original data 


Five-year moving average 


13.4 MEASURING SEASONAL VARIATION 

The term seasonal variation brings to mind those fluctuations associated with 
climate and seasonally related activities and customs. Climate causes variation in 
the production of farm products and the availability of other raw materials, It also 
affects the flow of goods and human patterns of consumption in other ways. The 
seasonal aspect of recreational activities is an obvious example. Social customs 
related to holidays have an influence on such variables as department-store sales 
and the production of certain farm products. 

Conceivably, we could make time-series data available for any time period 
desired. Most business data, however, are organized by days, weeks, months, or 
quarters. 

The analyst can use a number of methods for measuring seasonal variation. 
However, since our objective here is just to introduce the subject, we shall not 
cover all of them. We shall limit our discussion to the ratio-to-moving-average 
method. This is generally considered to be the most practical and theoretically 
satisfying method. The technique has the additional advantage of being widely 
used. It consists of two basic steps. 

1. Calculating a somewhat sophisticated 12-month moving average for each time 
period 

2. Dividing this moving average for each time period into the original value and 
multiplying by 100 to obtain the ratio-to-moving-average value 


TABLE 13.4.1 
Monthly sales of 
apples, 1979-1983, 
by a farmers' 
cooperative 


We can use these results to calculate seasonal indexes and to deseasonalize 
data, as the following example shows. 

EXAMPLE 13.4.1 Column 1 of Table 13.4.1 shows the monthly sales, in bushels, 
of a certain variety of apple for the years 1979 through 1983, by a farmers’ 
cooperative. Columns 2 through 5 contain the computations needed to carry out 
the ratio-to-moving-average method for computing indexes of seasonal variation. 
The data are plotted in Figure 13.4.1. 

The seasonal pattern associated with the sales of apples can be readily seen in 
Figure 13.4.1. The peak periods are from October through March. The lowest 
point on the curve each year occurs in August. 




(1) 

(2) 

(3) 

(4) 

(5) 





Sum of two 

Centered 

Ratio-to- 



Number of 

12-month 

12-month 

12-month 

moving 

Year 

Month 

bushels 

moving total 

moving totals 

moving average 

average 

1979 

Jan. 

2,406 






Feb. 

2,604 






Mar. 

3,112 






Apr. 

2,915 






May 

2,033 






June 

643 

20,852 





July 

291 

20,061 

40,913 

1,704.7 

17.1 


Aug. 

67 

1 9,090 

39,151 

1,631.3 

4.1 


Sept. 

491 

1 8,077 

37,167 

1,548.6 

31.7 


Oct. 

2,394 

16,969 

35,046 

1,460.2 

164.0 


Nov. 

2,085 

15,956 

32,925 

1,371.9 

152.0 


Dec. 

1,811 

15,579 

31,535 

1,314.0 

137.8 

1980 

Jan. 

1,615 

15,432 

31,011 

1,292.1 

125.0 


Feb. 

1,633 

15,421 

30,853 

1,285.5 

127.0 


Mar. 

2,099 

1 5,738 

31,159 

1,298.3 

161.7 


Apr. 

1,807 

16,810 

32,548 

1,356.2 

133.2 


May 

1,020 

17,493 

34,303 

1,429.3 

71.4 


June 

266 

1 8,894 

36,387 

1,516.1 

17.5 



TABLE 13.4.1 
{Continued) 




(1) 

(2) 

(3) 

(4) 

(5) 





Sum of two 

Centered 

Ratio-to- 



Number of 

12-month 

12-month 

12-month 

moving 

Year 

Month 

bushels 

moving total 

moving totals 

moving average 

average 


July 

144 

20,473 

39,367 

1,640.3 

8.8 


Aug. 

56 

21,941 

42,414 

1,767.2 

3.2 


Sept. 

808 

23,338 

45,279 

1,886.6 

42.8 


Oct. 

3,466 

23,657 

46,995 

1,958.1 

177.0 


Nov. 

2,768 

23,993 

47,650 

1,985.4 

139.4 


Dec. 

3,212 

24,176 

48,169 

2,007.0 

160.0 

1981 

Jan. 

3,194 

24,179 

48,355 

2,014.8 

158.5 


Feb. 

3,101 

24,156 

48,335 

2,014.0 

154.0 


Mar. 

3,496 

24,186 

48,342 

2,014.2 

173.6 


Apr. 

2,126 

23,086 

47,272 

1,969.7 

107.9 


May 

1,356 

22,108 

45,194 

1,883.1 

72.0 


June 

449 

21,390 

43,498 

1,812.4 

24.8 


July 

147 

20,332 

41,722 

1,738.4 

8.5 


Aug. 

33 

19,227 

39,559 

1,648.3 

2.0 


Sept. 

838 

17,945 

37,172 

1,548.8 

54.1 


Oct. 

2,366 

18,089 

36,034 

1,501.4 

157.6 


Nov. 

1,790 

18,290 

36,379 

1,515.8 

118.1 


Dec. 

2,494 

18,735 

37,025 

1,542.7 

161.7 

1982 

Jan. 

2,136 

19,177 

37,912 

1,579.7 

135.2 


Feb. 

1,996 

19,328 

38,505 

1,604.4 

124.4 


Mar. 

2,214 

18,821 

38,149 

1,589.5 

139.3 


Apr. 

2,270 

17,973 

36,794 

1,533.1 

148.1 


May 

1,557 

17,709 

35,682 

1,486.8 

104.7 


June 

894 

17,515 

35,224 

1,467.7 

60.9 


July 

589 

17,004 

34,519 

1,438.3 

41.0 



TABLE 13.4.1 
(Continued) 




0) 

(2) 

(3) 

Sum of two 

(4) 

Centered 

(5) 

Ratio-to- 



Number of 

12-month 

12-month 

12-month 

moving 

Year 

Month 

bushels 

moving total 

moving totals 

moving average 

average 


Aug. 

184 

16,775 

33,779 

1,407.5 

13.1 


Sept. 

331 

16,691 

33,466 

1,394.4 

23.7 


Oct. 

1,518 

1 6,087 

32,778 

1,365.8 

111.1 


Nov. 

1,526 

15,965 

32,052 

1,335.5 

114.3 


Dec. 

2,300 

15,497 

31,462 

1,310.9 

175.5 

1983 

Jan. 

1,625 

1 5,027 

30,524 

1,271.8 

127.8 


Feb. 

1,767 

14,859 

29,886 

1,245.2 

141.9 


Mar. 

2,130 

14,759 

29,618 

1,234.1 

172.6 


Apr. 

1,666 

14,328 

29,087 

1,212.0 

137.5 


May 

1,435 

14,295 

28,623 

1,192.6 

120.3 


June 

426 

13,713 

28,008 

1,167.0 

36.5 


July 

119 






Aug. 

16 






Sept. 

231 






Oct. 

1,087 






Nov. 

1,493 






Dec. 

1,718 






The ratio-to-moving-average method may proceed according to the following 
six steps. (All column references are to Table 13.4.1.) The ultimate objective is 
to produce a seasonal index, a number for each month showing the original value 
for that month as a percentage of the average month. The method we use rests 
on the multiplicative model, Y = T • S • C • E, discussed earlier. The calculations 
consist of obtaining an estimate of the T • C component, which is divided into 
each observed value of Y to obtain an estimate of S • E. Finally we obtain an 
estimate of S by a method designed to eliminate E. 

1. We first obtain the 12-month moving totals. The first figure in column 2 is the 
total for the 12 months of 1979. This value falls between the months of June and 
July and is so placed in the table. The second figure in column 2 is the sum of 







values from February 1979 through January 1980, inclusive. In this manner, we 
obtain the 12-month moving totals shown in column 2. 

2. Column 3 gives the sum of two consecutive 12-month moving totals. We get 
these by adding the figures in column 2, a pair at a time, in such a way that 
beginning with the second, we add each figure once to the preceding figure and 
once to the succeeding figure. Thus, the first entry in column 3 is the sum of the 
first and second figures in column 2. The second entry in column 3 is the sum of 
the second and third figures in column 2. 

3. We find the centered 12-month moving averages of column 4 by dividing the 
entries in column 3 by 24. We find the 12-month moving average in order to get 
an estimate of the T • C component. Dividing both sides of the model equation 
by T • C yields an estimate of the S • E component. Column 5 shows the result. 
The next step explains its calculation. 

4. To obtain column 5, we divide the original monthly entries in column 1 by 
the corresponding entries in column 4, and multiply by 100. 

5. We next obtain for each month an “average” of the ratio-to-moving-average 
values in column 5. We can use any legitimate quantitative average-—for example, 
the median or mean—depending on the nature of the data. The objective in av¬ 
eraging is to obtain for each month a value that represents the typical seasonal 
effect for that month. In a commonly used method, we compute a modified mean , 
which is the arithmetic mean of the values when extreme values have been dis¬ 
carded. We shall use that method of averaging. Rearranging the data of column 
5 as in Table 13.4.2 makes the work easier. 

We use a modified mean rather than, say, the arithmetic mean of all the values 
in order to eliminate the problem of the influence of extreme values. If we elim¬ 
inate an equal number of the smallest and largest values, the resulting mean is 
based on central values. In Table 13.4.2, we found the modified mean by com¬ 
puting the mean of the ratio-to-moving-average values for each month after we 
eliminated the largest and smallest value. 


TABLE 13.4.2 
Ratio-to-moving 
average values. 
Example 13.4.1 


Month 



Year 



Modified 

Seasonal 

index 

1979 

1980 

1981 

1982 

1983 

mean 

January 


125.0 

158.5 

135.2 

127.8 

131.5 

132.4 

February 


127.0 

154.0 

124.4 

141.9 

134.4 

135.3 

March 


161.7 

173.6 

139.3 

172.6 

167.2 

168.4 

April 


133.2 

107.9 

148.1 

137.5 

135.4 

1 36.3 

May 


71.4 

72.0 

104.7 

120.3 

88.4 

89.0 

June 


17.5 

24.8 

60.9 

36.5 

30.6 

30.8 

July 

17.1 

8.8 

8.5 

41.0 


13.0 

13.1 

August 

4.1 

3.2 

2.0 

13.1 


3.6 

3.6 

September 

31.7 

42.8 

54.1 

23.7 


37.2 

37.5 

October 

164.0 

177.0 

157.6 

111.1 


160.8 

161.9 

November 

152.0 

139.4 

118.1 

114.3 


128.8 

129.7 

December 

137.8 

160.0 

161.7 

175.5 


160.8 

161.9 

Total 






1,191.7 
_ b _ 

1,199.9 






6. Finally we get the seasonal indexes by adjusting the modified means so that 
their sum is 1200. To do so, we multiply each modified mean by 1200/1191.7 = 
1.0069648. The last column of Table 13.4.2 shows the seasonal indexes. 


Once we have obtained a measure of seasonal variation, the question is: How 
can we use it? Two general uses are (1) to analyze past data, and (2) to plan 
future activity. 

First of all, we can remove seasonal variation from a series in order to see how 
things might have been had there been no seasonal fluctuation. The results of such 
calculations are deseasonalized data . In order to show how a series is deseason- 
alized, let us apply the method to the 1983 data of Example 13.4.1. The com¬ 
putations consist of dividing each observed monthly value by the corresponding 
seasonal index and multiplying by 100. For example, the January 1983 index for 
the sale of apples series is 132.4. This means that the January shipments are 132.4 
percent of the average month. If we divide the January 1983 sales, which were 
1625, by 132.4 and multiply by 100, we have 


1625 

132.4 


X 100 = 1227 


We say, then, that had there been no seasonal variation, January 1983 sales 
would have been 1227 bushels. Alternatively, we could have divided 1625 by 
1.324 and obtained the same result. We shall divide each of the seasonal indexes 
by 100. We then divide the corresponding monthly sales figures for the year 1983 
by this result. Table 13.4.3 shows the deseasonalized data. 

Suppose that it is estimated that total sales for 1984 will be 10,000 bushels. 
What might we expect the month-by-month sales to be? If we did not know about 
seasonal variation, we might estimate each month’s sales to be 1/12 of the yearly 
figure, or 833 bushels. We know, however, that there is variation from month to 
month, and we have a measure of this variation. For example, we expect January 
sales to be about 132.4% of the average month. For an estimate of the January 


TABLE 13.4.3 
Deseasonalized 
data. Example 
13.4.1 

Month 

Sales 1983 
(bushels) 

Seasonal 
index/100 

Deseasonalized 

sales 

January 

1.625 

1.324 

1,227 

February 

1,767 

1.353 

1,306 


March 

2,130 

1.684 

1,265 


April 

1,666 

1.363 

1,222 


May 

1,435 

0.890 

1,612 


June 

426 

0.308 

1,383 


July 

119 

0.131 

908 


August 

16 

0.036 

444 


September 

231 

0.375 

616 


October 

1,087 

1.619 

671 


November 

1,493 

1.297 

1,151 


December 

1,718 

1.619 

1,061 



TABLE 13.4.4 
Estimates of 
monthly sales of 
apples for 1984 


Month 

Seasonal index 

Estimated sales 
(833 x seasonal index) 

January 

1.324 

1,103 

February 

1.353 

1,127 

March 

1.684 

1,403 

April 

1.363 

1,135 

May 

0.890 

741 

June 

0.308 

257 

July 

0.131 

109 

August 

0.036 

30 

September 

0.375 

312 

October 

1.619 

1,349 

November 

1.297 

1 080 

December 

1.619 

1,349 


sales that takes seasonal variation into account, we multiply 833 by 1.324 to 
obtain 1103. Assuming that the same seasonal pattern persists into 1984, we obtain 
the estimates of monthly sales shown in Table 13.4.4. 

Exercises 13.4.1 The following table gives the number of pairs of water skis sold by a sporting- 

goods dealer, by month and year, 1979-1983. (a) Plot the data, (b) Compute the seasonal 
indexes from these data, as in Example 13.4.1. (c) Compute the descasonalized values for 


1983. 

Year 

j 

F 

M 

A 

M 

j 

J 

A 

S 

0 

N 

D 

1979 

0 

2 

10 

4 

89 

33 

11 

4 

17 

5 

1 7 

0 

1980 

3 

0 

5 

4 

14 

23 

7 

11 

11 

4 

4 

8 

1981 

9 

2 

46 

11 

14 

30 

22 

4 

7 

4 

0 

2 

1982 

13 

4 

56 

30 

90 

20 

15 

11 

6 

5 

1 

7 

1983 

4 

12 

6 

10 

17 

32 

24 

9 

10 

5 

17 

1 

13.4.2 

In 1984 the dealer sold 

180 pairs of water skis. 

Using the results of Exercise 


13.4.1, estimate the number of sales by month. 

13.4.3 In 1984, the sales by month were as follows, (a) How well do you think your 
estimates agree with the actual data? (b) How do you explain the discrepancies? (c) How 
would you suggest improving your estimates? 



January 

0 

July 

20 

February 

8 

August 

8 

March 

29 

September 

13 

April 

26 

October 

2 

May 

42 

November 

3 

June 

29 

December 

0 


13.4.4 The following table shows a toy manufacturer’s monthly sales, x $10,000, over 
a period of 5 years, (a) Plot the data, (b) Compute the seasonal indexes, (c) Compute the 


descasonalized values for 1983. (d) In 1984 total sales are $2,050,000. Estimate the sales 
by month. 



Month 


Year 

J 

F 

M 

A 

M 

J 

J 

A 

s 

O 

N 

D 

1979 

10 

12 

12 

14 

12 

15 

14 

10 

10 

13 

16 

28 

1980 

8 

12 

12 

12 

13 

16 

15 

9 

9 

12 

18 

31 

1981 

7 

10 

10 

13 

14 

16 

15 

8 

8 

14 

17 

37 

1982 

10 

14 

14 

14 

15 

17 

14 

10 

10 

15 

20 

38 

1983 

12 

14 

13 

14 

15 

17 

15 

10 

11 

15 

20 

40 


13.5 MEASURING CYCLICAL VARIATION 


Business cycles are fluctuations in the total economic activity that are beyond the 
control of the businessperson. Cycles reflect periods of rapid growth followed by 
periods of slower growth or recession. There is no regularity to business cycles, 
in the sense that they do not occur at regular time intervals and are not of the 
same duration. 

For these reasons, we cannot forecast cycles as we can seasonal fluctuations. 
We can, however, isolate them, and we have a number of methods of doing so. 
We can measure cyclical variation in data reported either on a monthly basis or 
annually. If we use monthly data, we can eliminate the seasonal component. The 
following example shows how to measure cyclical variation when we have only 
annual data available. Example 13.5.2 demonstrates the method using monthly 
data. 

The method that we give here for dealing with annual data is commonly used, 
computationally simple, and intuitively appealing. It rests on the fact that in annual 
data there are only two components, trend and cycle. Seasonal fluctuations do not 
appear, because all seasons are represented. It also assumes that irregular varia¬ 
tions have little influence on annual data. We can express the model that remains 
as 


Y = T C ( 13 . 5 . 1 ) 

If we divide both sides of the equation by T, we have 

Y 

C = “ (13.5.2) 

This indicates that dividing the values in the original series by the corresponding 
trend values yields a measure of the cycle. This measure, when multiplied by 
100, is called a cyclical relative. 


EXAMPLE 13.5.1 We wish to isolate the cyclical component from a series composed 
of the annual production of pairs of shoes by a shoe manufacturer in the United 
States from 1944 to 1982. Table 13.5.1 gives the original data. 

The first step in the procedure is to compute the trend values. We shall do this 
by the method of least squares. Table 13.5.1 also shows the intermediate com¬ 
putations. From the data in the table we obtain 




TABLE 13.5.1 
Pairs of shoes 
produced (x 1000) 
by a shoe 
manufacturer, 
1944-1982 


1 

Year 

2 

Year code (x) 

3 

Production (y) 

4 

xy 

5 

x2 

6 

Yt 

1944 

-19 

380 

-7.220 

361 

443 

1945 

-18 

417 

- 7,506 

324 

445 

1946 

-17 

406 

-6,902 

289 

447 

1947 

-16 

450 

- 7,200 

256 

449 

1948 

-15 

478 

-7,170 

225 

450 

1949 

-14 

423 

- 5,922 

196 

452 

1950 

-13 

443 

- 5,759 

169 

454 

1951 

-12 

503 

- 6,036 

144 

455 

1952 

-11 

552 

-6,072 

121 

457 

1953 

-10 

579 

- 5,790 

100 

459 

1954 

-9 

466 

-4,194 

81 

461 

1955 

-8 

569 

- 4,552 

64 

462 

1956 

-7 

416 

- 2,912 

49 

464 

1957 

-6 

422 

-2,532 

36 

466 

1958 

-5 

565 

-2,825 

25 

467 

1959 

-4 

484 

-1,936 

16 

469 

1960 

-3 

520 

-1,560 

9 

471 

1961 

-2 

573 

-1,146 

4 

473 

1962 

-1 

518 

-518 

1 

474 

1963 

0 

501 

0 

0 

476 

1964 

1 

535 

535 

1 

478 

1965 

2 

468 

936 

4 

479 

1966 

3 

382 

1,146 

9 

481 

1967 

4 

310 

1,240 

16 

483 

1968 

5 

334 

1,670 

25 

485 

1969 

6 

359 

2,154 

36 

486 

1970 

7 

372 

2,604 

49 

488 

1971 

8 

439 

3,512 

64 

490 

1972 

9 

446 

4,014 

81 

491 

1973 

10 

349 

3,490 

100 

493 

1974 

11 

395 

4,345 

121 

495 

1975 

12 

461 

5,532 

144 

497 

1976 

13 

514 

6,682 

169 

498 

1977 

14 

583 

8,162 

196 

500 

1978 

15 

590 

8,850 

225 

502 

1979 

16 

620 

9,920 

256 

503 

1980 

17 

578 

9,826 

289 

505 

1981 

18 

534 

9,612 

324 

507 

1982 

19 

631 

11,989 

361 

509 

Total 


18,565 

8,467 

4,940 



a 


18,565 

39 


476.0 


and 


8467 

4940 


1.714 


The trend equation, then, is 

y t = 476 4- 1.714* 

(Origin, 1963; time unit, 1 year; y, annual production of pairs of shoes, 1944- 
1982, in thousands.) Substituting the year codes into the equation yields the trend 



values shown in column 6 of Table 13.5.1. The original values and the trend line 
are plotted in Figure 13.5.1. 

Column 4 of Table 13.5.2 shows the cyclical relatives, obtained by dividing 
observed production by trend and multiplying by 100. The cyclical relatives are 
plotted in Figure 13.5.2. 


In monthly data, the seasonal factor is present, as are any irregular fluctuations. 
To eliminate both the trend and the seasonal component, we divide the original 
observations by both the trend value and the seasonal index. We may express this 
as 


which leads to 


Y = T • S - C ' E 
T • S T • S 


C • E 


Y 

T • 5 


(13.5.3) 


(13.5.4) 


If we multiply C • E by 100, we get what are called the cyclical irregulars. As 
you can see, we have not eliminated the irregular movements. We try to eliminate 
them by computing a weighted moving average from the cyclical irregulars. The 
resulting averages are the cyclical relatives. We compute the weighted moving 
average as follows: Assume that for a series of data beginning in January we want 


FIGURE 13.5.1 
Annual production 
of pairs of shoes 
(x 1000), 1944- 
1982, original data 
and trend line 



Trend 


Original data 










TABLE 13.5.2 
Pairs of shoes 
produced annually 
(x 1000) by a shoe 
manufacturer 
(adjustment for 
secular trend) 


Year 

(1) 

Production 
(x 1000) 

(2) 

Trend 

(3) 

Cyclical relatives 
{column 2 divided by 
column 3 times 100) 

(4) 

1944 

380 

443 

85.8 

1945 

417 

445 

93.7 

1946 

406 

447 

90.8 

1947 

450 

449 

100.2 

1948 

478 

450 

106.2 

1949 

423 

452 

93.6 

1950 

443 

454 

97.6 

1951 

503 

455 

110.5 

1952 

552 

457 

120.8 

1953 

579 

459 

126.1 

1954 

466 

461 

101.1 

1955 

569 

462 

123.2 

1956 

416 

464 

89.7 

1957 

422 

466 

90.6 

1958 

565 

467 

121.0 

1959 

484 

469 

103.2 

1960 

520 

471 

110.4 

1961 

573 

473 

121.1 

1962 

518 

474 

109.3 

1963 

501 

476 

105.3 

1964 

535 

478 

111.9 

1965 

468 

479 

97.7 

1966 

382 

481 

79.4 

1967 

310 

483 

64.2 

1968 

334 

485 

68.9 

1969 

359 

486 

73.9 

1970 

372 

488 

76.2 

1971 

439 

490 

89.6 

1972 

446 

491 

90.8 

1973 

349 

493 

70.8 

1974 

395 

495 

79.8 

1975 

461 

497 

92.8 

1976 

514 

498 

103.2 

1977 

583 

500 

116.6 

1978 

590 

502 

117.5 

1979 

620 

503 

123.3 

1980 

578 

505 

114.5 

1981 

534 

507 

105.3 

1982 

631 

509 

124.0 


a three-month weighted moving average. We find it by adding the cyclical irregular 
for January once, the cyclical irregular for February twice, and the cyclical irreg¬ 
ular for March once. We then divide the total byl 4-2 + 1 = 4.We continue 
in a similar manner throughout the series. A five-month weighted moving average 
would use the values for January through May, weighted 1, 4, 6, 4, 1, respec¬ 
tively, where these weights—as were the weights 1, 2, 1—are binomial coeffi¬ 
cients. We can use other numbers of months, with the appropriate binomial coef- 



FIGURE 13.5.2 
Cyclical relatives, 
pairs of shoes 
produced (x 1000) 
by a shoe 
manufacturer, 
1944-1982 



ficients as weights. The weighted moving average lets us control the extent of 
smoothing by the choice of weights. 

EXAMPLE 13.5.2 We can use the data of Example 13.4.1 to show how to measure 
cyclical variation from monthly data. Although the study of cyclical variation 
would ordinarily cover a longer period of time, we shall limit our illustration to 
a single year, 1979. 

The first step is to obtain monthly trend values. Under certain conditions we 
can get these from the trend equation based on annual data. In this example, 
however, it seems better to compute monthly values directly from the data. When 
we want the least-squares trend line, the procedure we use is identical to that for 
annual data presented in Section 13.2, except that we use months rather than years 
as the X values. For the data in Table 13.4.1, for example, there are an even 
number of observations, so that we assign June 1981 a code of — 1, July 1981 a 
code of 1, and so on. We can compute the following least-squares equation from 
these data: 


y t = 1539.4 - 6.405* 

(Origin, July 1, 1981; time unit, one-half month; y, monthly sales of apples, 
1979-1983.) If we successively substitute -59, —57, . . ., 59 into the least- 
squares equation, the result is the corresponding monthly trend values. These are 
shown for the year 1979 in column 2 of Table 13.5.3. 

The next components needed are the seasonal indexes. These are given in Table 
13.4.2 and recorded again in column 3 of Table 13.5.3. We multiply each of the 
monthly trend values by the corresponding seasonal index and divide by 100. This 
product is called the normal , since it represents the value that we would expect 
for each month if trend and seasonal variation were the only components present. 




TABLE 13.5.3 
Calculations for 
cyclical irregulars 
for the year 1979, 
Example 13.5.2 


Month 

(1) 

Sales 

m 

(2) 

Trend (7) 

(3) 

Seasonal 
index (S) 

(4) 

Normal 
(7 x S)/100 

(5) 

Cyclical irregulars 

(CXf) = 10 VxJ,/100 

January 

2,406 

1,917.3 

132.4 

2,538.5 

94.8 

February 

2,604 

1,904.5 

135.3 

2,576.8 

101.1 

March 

3,112 

1,891.7 

168.4 

3,185.6 

97.7 

April 

2,915 

1,878.9 

136.3 

2,560.9 

113.8 

May 

2,033 

1,866.1 

89.0 

1,660.8 

122.4 

June 

643 

1,853.3 

30.8 

570.8 

112.6 

July 

291 

1,840.4 

13.1 

241.1 

120.7 

August 

67 

1,827.6 

3.6 

65.8 

101.8 

September 

491 

1,814.8 

37.5 

680.6 

72.1 

October 

2,394 

1,802.0 

161.9 

2,917.4 

82.1 

November 

2,085 

1,789.2 

129.7 

2,320.6 

89.8 

December 

1,811 

1,776.4 

161.9 

2,876.0 

63.0 


TABLE 13.5.4 
Calculation of 
cyclical relatives by 

Month 

(D 

Cyclical irregulars 

C X E 

(2) 

Weighted three-month 
moving total (1, 2, 1) 

(3) 

Cyclical relatives 

C = column 2/4 

three-month 

January 

94.8 



moving average. 

February 

101.1 

394.7 

98.7 

Example 13.5.2 

March 

97.7 

410.3 

102.6 

April 

113.8 

447.7 

111.9 


May 

122.4 

471.2 

117.8 


June 

112.6 

468.3 

117.1 


July 

120.7 

455.8 

114.0 


August 

101.8 

396.4 

99.1 


September 

72.1 

328.1 

82.0 


October 

82.1 

326.1 

81.5 


November 

89.8 

324.7 

81.2 


December 

63.0 




That is, (T x 5)/100 gives the values that we would expect if the cyclical and 
irregular factors were not present. Note that dividing by 100 in column 4 cancels 
the effect of multiplying by 100, which we did in step 5 of the procedure by 
which we computed the seasonal indexes. Finally we find the cyclical irregulars 
by dividing the observed values of Y by (T x S )/100 and multiplying by 100. 
This is the step indicated by Equation 13.5.4. 

Now we can remove the irregular component by means of a three-month moving 
average. The results of this step give the cyclical relatives, which represent the 
cyclical effect alone. That is, we have eliminated the effects of trend, season, and 
erratic variation. Table 13.5.4 gives the results. 

Exercises 13.5.1 The following table contains figures on cotton production (bales) on a Southern 

^ arm ^ rom ^30 through 1979. (a) Plot the original data and the least-squares trend line, 
(b) Compute and plot the cyclical relatives. 





Cotton 


Cotton 


Cotton 

...... 

Year 

production 

Year 

production 

Year 

production 


1930 

309 

1947 

249 

1964 

234 


1931 

360 

1948 

286 

1965 

339 


1932 

369 

1949 

124 

1966 

386 


1933 

256 

1950 

173 

1967 

401 


1934 

398 

1951 

331 

1968 

491 


1935 

377 

1952 

301 

1969 

436 

SHE 

1936 

258 

1953 

165 

1970 

234 

rr 

1937 

381 

1954 

90 

1971 

322 

ms 

1938 

398 

1955 

142 

1972 

397 


1939 

327 

1956 

48 

1973 

432 


1940 

350 

1957 

72 

1974 

369 


1941 

360 

1958 

135 

1975 

370 


1942 

151 

1959 

204 

1976 

331 


1943 

299 

1960 

188 

1977 

297 


1944 

248 

1961 

160 

1978 

393 


1945 

140 

1962 

228 

1979 

414 

***** 

1946 

310 

1963 

209 





13.5.2 Using the data of Example 13.4.1: (a) Compute the cyclical irregulars for 1980 
through 1983. (b) Compute the cyclical relatives by means of a weighted three-month 
moving average. 

13.5.3 The following table shows the annual sales of a certain department store, (a) Plot 
the original data and the least-squares trend line, (b) Compute and plot the cyclical rela¬ 
tives. 


Year 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 

Annual sales 

{x $10,000) 5 7 8 6 4 8 12 12 12 17 22 20 17 22 27 


IB.6 FORECASTING 

We are familiar with the main components of time series and some of the tech¬ 
niques for measuring these components. With this background, let us now consider 
the complex problem of business forecasting. We make forecasts based on time- 
series analysis under the basic assumption that the future will be like the past. We 
extend the underlying patterns that we identified from the analysis of historical 
data into future time periods to obtain forecasts for those periods. We use the 
secular-trend equation for this purpose. 

If there is no seasonal component present, the values obtained represent our 
forecasts. If we recognize a seasonal component and compute seasonal indexes, 
we must multiply the values obtained from the secular-trend equation by the 
appropriate seasonal index. Example 13.6.1 illustrates the procedures involved. 


EXAMPLE 13.6.1 We are given the following secular-trend equation based on the 
analysis of annual sales of washing machines by a certain manufacturer for the 
years 1968-1982. 


y t = 1.8624 + 0.1728.x 


(13.6.1) 










(Origin, 1975; time unit, 1 year; annual sales, in 10,000 units.) We want to 
forecast both annual and monthly sales for 1983. 

We obtain the trend value for 1982 by substituting * = 7 in the trend equation. 
This substitution yields 

y t = 1.8624 + 1.2096 = 3.072 or 30,720 units 

We obtain the forecast for 1983 by substituting 8 into the equation. The result is 

y t = 1.8624 + 1.3824 = 3.2448 or 32,448 units 

This is the desired annual forecast for 1983. 

Now we turn our attention to monthly forecasts for 1983. First we change the 
secular-trend equation so that it provides average monthly sales instead of annual 
sales for each year. We do this by dividing each term by 12: 

y t 1.8624 0.1728 

— = - + - -x 

12 12 12 


The modified equation is 

y tm = 0.1552 + 0.0144* 

(Origin, 1975; time unit, 1 year; monthly sales in 10,000 units.) We need to make 
a second modification in order to change the time unit from years to months. To 
do this, we divide 0.0144 by 12. The equation now becomes 

y tm = 0.1552 + 0.0012*,; 

The origin of the original secular-trend equation was centered on the middle of 
1975, which is between June and July 1975. We make a final modification in 
order to center the origin at the middle of July 1975. This modification consists 
of adding one-half of the monthly trend to the origin. Thus the final secular-trend 
equation for monthly forecasts is 

y tm = (0.1552 + 0.0012/2) + 0.0012** = 0.1558 + 0.0012*„, <13.6.2) 

(Origin, July 1975; time unit, 1 month; monthly sales in 10,000 units.) 

If sales are not influenced by seasonal fluctuations, we can use Equation 13.6.2 
to provide monthly forecasts for 1983. Substituting x m — 90 into Equation 13.6.2 
results in a trend value of 0.2638 for January 1983. Substituting * = 91, 92, 93, 

. . ., 101 into Equation 13.6.2 yields forecasts for the remaining 11 months of 
1983. 

The second column of Table 13.6.1 (page 470) shows the trend values for all 
12 months of 1983. As we said before, if no seasonal pattern is apparent, we can 
use these trend values as the monthly forecasts for 1983. 

However, assume that a seasonal pattern does exist, as shown by the seasonal 
indexes in column 3 of Table 13.6.1. We would compute this type of seasonal 
index by the procedures discussed in Section 13.4. When we can identify seasonal 
fluctuations and compute a seasonal index, we get monthly (seasonal) forecasts 



TABLE 13.6.1 
Work table for 
monthly forecasts 


A General 

Conversion 

Procedure 


(1) 

Month 

(2) 

Trend values 
(x 10,000) 

(3) 

Seasonal 

index 

(4) 

Monthly forecasts for 

1983 (x 10,000) 

(col 2 x col 3) 

January 

0.2638 

0.95 

0.2506 

February 

0.2650 

0.88 

0.2332 

March 

0.2662 

0.98 

0.2609 

April 

0.2674 

1.02 

0.2727 

May 

0.2686 

0.91 

0.2444 

June 

0.2698 

0.99 

0.2671 

July 

0.2710 

0.94 

0.2547 

August 

0.2722 

0.96 

0.2613 

September 

0.2734 

1.05 

0.2871 

October 

0.2746 

0.92 

0.2526 

November 

0.2758 

1.10 

0.3034 

December 

0.2770 

1.30 

0.3601 


by multiplying the trend values for each month by the seasonal index value for 
that month. Column 4 of Table 13.6.1 shows the forecasts for the present example. 


In Example 13.6.1, we use Equation 13.6.1 when we want to make annual fore¬ 
casts, and Equation 13.6.2 when we want to make monthly forecasts. We obtain 
Equation 13.6.2 by modifying Equation 13.6.1. Similarly, we can modify the 
original secular-trend equation to provide forecasts for other time periods, such 
as quarters, six-month periods, or five-year periods. 

We now show a general method of converting a secular-trend equation that uses 
one unit of time to a secular-trend equation that uses another unit of time. Assume 
that the original secular-trend equation is 

y t — a + bx 

and that the new modified equation will be 

y, m = a m + b m x m + A 

Here A is an adjustment to the origin that is used when necessary. 

The general procedure involved consists of the following three steps: 

1. Change the unit used in the equation. 

2. Change the period being estimated or forecast. 

3. Center the estimate or forecast on the new unit, if necessary. 

These steps are carried out as follows: 

1. Divide the slope of the current equation by the proper constant, n x , where n x 
is the number of new units contained in one of the units (jc) used in the original 
equation. To change from year to month, divide by 12. To change from quarter 
to year, divide by (Note that the period being estimated or forecast is unchanged 
by this step.) 

2. Divide the entire equation that results from step 1 by the proper constant n, 







where n is the number of new units contained in one of the original forecast 
periods. To change from year to month, divide by 12. To change from month to 
year, divide by (Note that n x and n may be equal.) 

3. Add (or subtract) a portion of the slope of the equation from step 2 to the y 
intercept obtained in step 2. You do this in order to center forecasts of the new 
equation on the new unit. This step is not needed if the estimates or forecasts are 
already properly centered. 

In formula form, the steps of the conversion procedure are: 


l, a H- x. 


2 . 


b 

a + — x r 
n x 

n 


a b 
n n(n x ) 



+ 


b 

n(n x ) 


+ A 


Thus the new modified equation that we use is 


y 


- + x m + A - a m + b m x m + A 
n n{n x ) 


(IB.6.3) 


where 


a 


tn 


a 

n 



= n v jc. 


and 


n 

n Y = n or - 
A 2 


If you need to adjust the origin, so that it is centered at the middle of one of the 
new units, add or subtract a fraction of b m , since the amount of the shift should 
be less than one of the new units. The procedure is the same as that discussed in 
Section 13.2.1. 


The following examples will help you to understand the conversion procedure. 


EXAMPLE 13.6.2 Refer to Example 13.6.1. Modify Equation 13.6.1 so that you 
can use it to make quarterly forecasts. 

n = n x - 4 (four quarters in one year) 

You need to make an adjustment of (i)b m to center the origin at the third quarter 
of 1971. Use Equation 13.6.3 with adjustment. 

yt m b m X m 4 " (2 }b m 

1.8624 / 0.1728 \ x m ( l\ 0.1728 

4 + V 4 / 4 + \2J 4 2 

= 0.4656 + 0.0108.r,„ + 0.0054 = 0.4710 + 0.0108x, M 


(Origin, third quarter, 1971; time unit, 1 quarter; sales in 10,000 units.) 

EXAMPLE 13.6.3 Refer to Example 13.6.1. Modify Equation 13.6.1 so that you 
can use it to make five-year forecasts. 



Some Other 

Forecasting 

Techniques 



EXAMPLE 13.6.4 Refer to Example 13.2.2. Modify the secular-trend equation so 
that you can use it to make forecasts for both six-month periods and single months. 
The original trend equation is: 

y t = 523.92 + 6.33x 

(Origin, 1977-1978; time unit, 6 months; number of job openings.) 

First we find the following modified trend equation for six-month forecasts, in 
which n = 2, n x = 1, and an adjustment of \b m is required. 



^>.33js w ( 


(t) ~~ = 263.5425 + 3.165.x 


(Origin, first six months, 1978; time unit, 6 months; number of job openings.) 

Next we modify the original trend equation to obtain the modified trend equation 
for six-month forecasts, in which n = 12, n x = 6, and an adjustment of ib m is 
required. The resulting equation is 


y tmri 


523.92 /6.33\ x m 2 / A / 6.33 \ 

12 + V 12 / 6 + \2/ V 72 / 


= 43.6600 + 0.08792*^ + 0.0440 = 43.7040 + 0.08792^ 


(Origin, January 1978; time unit, one month; number of job openings.) 


Forecasts serve a variety of needs of managers in all types of organizations. These 
needs include: (1) Short-range plans for operations in the immediate future. (2) 
Intermediate-range plans for meeting, for the next 12 months, the needs for per¬ 
sonnel, equipment, and materials. (3) Long-range plans for production capacity 
and the location of new factories. Forecasting techniques can be grouped into four 
basic categories, as follows. 

1. Time-series methods. Time-series methods use past history as a basis for fore¬ 
casting future time periods. The earlier sections of this chapter have dealt with 
some of the methods more commonly used in time-series analysis and forecasting. 

Another commonly used technique in time-series analysis and forecasting is 
exponential smoothing. This technique is similar to the moving average discussed 
in Section 13.3, except that values are weighted exponentially to give greater 
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Exercises 


weight to more recent data. Both trend and seasonal components can be included. 
One can readily adapt exponential smoothing for use with a computer. It is also 
an aid to forecasting when a large number of different items are involved (as in 
inventory control). This technique is used mostly for short-range forecasting, but 
you can use it for some intermediate-range forecasting, too. 

2. Causal relationship methods. The methods in this category employ regression 
and econometric models that incorporate the various economic and competitive 
factors that cause or influence the demand for products and services. These meth¬ 
ods are mainly used to obtain short-range and intermediate-range forecasts for 
existing products and services. 

3. Predictive methods. There are many methods in this category. They include 
procedures that involve opinions from “experts,” various types of market surveys, 
historical analogy, and life-cycle analysis. These methods are used for the long- 
range forecasts needed in planning for production capacity and the location of 
new factories, and for the development, pricing, and formulation of market strat¬ 
egies for new and existing products. 

4. Simulation methods. In its business context, simulation usually refers to the 
use of a computer to perform experiments on a model of some real-world situation 
or system. The model may represent an existing system, in which case it gives a 
decision-maker a method of forecasting the consequences to the system of each 
of several alternative decisions or courses of action before any changes are actually 
made in the system. Simulation can be useful as an aid in the design of a system 
such as a new aircraft, a factory, or a store layout. Using a model to examine the 
results expected for various combinations of conditions that can exist enables 
planners to identify and eliminate flaws or bottlenecks in the design. Simulation 
is also useful in training managers or other decision-makers in how the real system 
operates. A final important use of simulation is in handling large and/or complex 
problems that are difficult or impossible to handle with optimizing techniques. 

[Forecasting methods are discussed in more detail in the books by Armstrong 
(1978), Bowerman and O’Connell (1979), Box and Jenkins (1976), Butler and 
Kavesh (1966), and Dauten and Valentine (1974).] 

13.6.1 Refer to Exercise 13.2.1. Obtain annual forecasts for the years 1985, 1986, and 
1987. 

13.6.2 Refer to Exercise 13.2.2. Obtain annual and six-month forecasts for 1985. 

13.6.3 Refer to Exercise 13.2.3. Obtain annual and seasonally adjusted quarterly forecasts 
for 1985. The seasonal factors for quarters arc as follows. 


Quarters Seasonal factors 


January-March 

1.12 

April-June 

0.88 

July-September 

0.71 

October-December 

1.29 


13.6.4 Refer to Exercise 13.2.4. Obtain annual and unadjusted monthly forecasts for 1983. 



The Consumer 
Price Index 


Types of Economic 
Index Numbers 


13.7 INDEX NUMBERS 

Sections 13.2 to 13.5 examined several methods of analyzing changes that occur 
over time. This section introduces another device, the index number, that is useful 
in describing changes in economic variables over time. Index numbers, essentially, 
are relative numbers that express the relationship between two figures, one of 
which is called the base . They are used to describe changes in business and 
economic activity—for example, changes over time in production, wages, prices, 
and employment. 

The best known example of an index number is the Consumer Price Index (CPI) 
for Urban Wage Earners and Clerical Workers. This is the most generally accepted 
measure of changes in purchasing power. Although there were earlier studies of 
the cost of living for wage earners, the CPI, formerly called the Cost-of-Living 
Index, was started by the United States government at the time of World War I. 
From the beginning, the CPI has been used in the evaluation and adjustment of 
wages by union and management negotiators. It has also been used in other types 
of contract-escalation provisions: leases, service contracts, annuities and pensions, 
welfare allowances, and alimony payments. The CPI is also widely used as a 
guide to decisions about economic policy. 

A 1964 revision made the CPI more representative of the total urban wage- 
earner and clerical-worker population. The items included, and their weights, are 
obtained from data on this group’s expenditures. They cover goods and services 
purchased by this group in a given year. 

Data collected in the 1960-1961 Surveys of Consumer Expenditures were tab¬ 
ulated for the CPI. The surveys were conducted in 66 urban areas: 33 Standard 
Metropolitan Statistical Areas (SMSAs) and 33 nonmetropolitan urban places. The 
sample included 517 workers who lived alone and 4343 families of two or more 
persons. Price data used for the periodic updating of the index are collected from 
56 urban areas. The concept behind the CPI is to compare, for different periods, 
the cost of some fixed set of goods that are representative of all purchases made 
by urban wage and clerical workers. This type of index is popularly known as a 
market-basket index. Technically, it is a price index with fixed or constant weights. 

It is the policy of the Bureau of Labor Statistics, which produces the CPI, to 
make it as generally available as possible. The National Consumer Price Index is 
released monthly from Washington via a press release. There is then a formal 
press conference late in the month following that to which the data refer. A more 
complete report is issued about two weeks later. Average U.S. indexes are pub¬ 
lished monthly for certain groups of expenditures. A government publication called 
The Consumer Price Index: History and Techniques gives a detailed account of 
the CPI. [Eisele (1975) also describes and discusses the index.] 

There are three important types of economic index numbers. (1) Price indexes 
measure the change in prices paid and prices received by producers and consumers. 
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(2) Quantity indexes measure changes in production and shipment. (3) Value 
indexes measure changes in the value of various commodities and activities. 

We can construct index numbers from a single series, such as the price of a 
single commodity. An index number of this type is called a simple index or price 
relative. We are often more interested in index numbers combining figures relating 
to several variables, such as the prices of several commodities. Index numbers 
constructed in this manner are called composite indexes. We shall limit our dis¬ 
cussion to this type of index. 

We may further categorize index numbers on the basis of the mathematical 
method used in their construction. We construct the aggregative type of index by 
summing (aggregating) figures for one time period, called the nonbase period. 
We compare them with a similar aggregate of figures for another period, called 
the base period. We obtain an average-of-relatives index by first calculating a 
relative for each item, then taking the average of the relatives. We can further 
classify both aggregative and average-of-relatives indexes as either weighted or 
unweighted , depending on whether or not we assign specific weights to the items 
in the computational process. Actually the term unweighted index may be a mis¬ 
nomer, since all items are weighted equally in the absence of formal weighting 
factors. The weighting scheme is inherent in the method of calculation. For this 
reason some writers refer to the index for which the analyst does not assign weights 
as a simple index. 


13.8 AGGREGATIVE PRICE INDEXES 

This section discusses the methods of constructing aggregative price indexes. We 
shall consider first the unweighted, then the weighted index. 

Unweighted 
Aggregative Price 
index 


EXAMPLE 13.8.1 To illustrate the construction of an unweighted aggregative price 
index, consider a collection of five foods consumed by a typical family in a certain 
city. The base year is 1973 and the nonbase year is 1983. Table 13.8.1 shows 
the data, which are fictitious. 

From the data, we can use Equation 13.8.1 to calculate the following un¬ 
weighted aggregative index for 1983 on 1973 as a base: 


The unweighted aggregative price index is given by 



where P 0 is the price in the base period and P n is the price in the nonbase period. 



/ $3.65 \ 


100 = 


$2.77 


100 = 131.8 



TABLE 13.8.1 
Unit prices for 
selected food 
commodities, 1973 
and 1983 


Weighted 
Aggregative Index , 
Base-Year Weights 
(Laspeyres Index) 


Food commodity 

1973 

P73 

Unit price 

1983 

P 83 

Meat 

$0.90 


$1.10 

Milk 

0.29 


0.35 

Eggs 

0.45 


0.65 

Bread 

0.28 


0.40 

Coffee 

0.85 


1.15 

Total 

$2.77 


$3.65 


Since 



100 



100 = 100 


the base year index equals 100. 

We can interpret the index figure of 131.8 as follows: It would have cost 31.8% 
more in 1983 than in 1973 to buy the goods in Table 13.8.1. Stated another way, 
the cost of the items in 1983 is 131.8% of the cost of the same items in 1973. 


Generally the unweighted aggregative index is an unsatisfactory measure be¬ 
cause of two limitations. First, the index is unduly influenced by high-priced items. 
Suppose, for example, that Table 13.8.1 had contained a sixth item that cost $5.00 
in 1973 and $4.00 in 1983. The totals for 1973 and 1983 would have been $7.77 
and $7.65, respectively. The index, therefore, would have been $7.65/7.77 = 
98.5, indicating a decline in price of 1.5%. Thus one high-priced item would 
have enough influence to cause the overall index to show a decline, even though 
five out of six items actually increased in price. 

The second limitation of the unweighted aggregative index is the effect of the 
arbitrary nature of the units of measurement for the items. When, for example, 
the unit for milk is a gallon, the price index is different from the index that results 
when the unit is a quart. 

These limitations indicate a need for a more satisfactory measure. Such a measure 
is provided by the weighted aggregative index. 


We can overcome the limitations of the aggregative price index just described by 
the proper use of weights. We can get a better index for the data of Table 13.8.1 
by assigning to each item a weight that reflects the amount of each item the typical 
family actually buys. These weights, which reflect the amounts or quantities of 
the items under consideration, are called quantity weights. 

Weights may represent quantities of items bought during either the base period 
or the nonbase period. Appropriate base-period weights for the data of Table 
13.8.1 would be average amounts of each item consumed per week during 1973. 
If the base-period weights are denoted by Q 0 (for quantity), the weighted aggre¬ 
gative index with base-period weights is given by 




















$P n Q 0 

ZPdQo. 


100 


( 13 . 8 . 2 ) 


This index, which measures the change in the total cost of a fixed bill of goods, 
is also known as the Laspeyres index, after the German statesman and economist 
Etienne Laspeyres (1834-1913), who first proposed the use of the formula. 

EXAMPLE 13.8.2 To illustrate the calculation of a weighted aggregative index using 
base-period weights, assume that the 1973 average weekly purchased quantities 
of the items in Table 13.8.1 were as shown in Table 13.8.2. The last two columns 
of Table 13.8.2 show that in 1983 it would cost $6.25 to buy the same goods that 
cost $4.81 in 1973. The weighted aggregative index for the example, then, is 

(<rNr) ioo = 129.9 
\$4.81/ 

Thus it costs 29.9% more in 1983 to purchase the same goods. 

One deficiency of the weighted aggregative index is that it assumes a fixed 
consumption pattern. That is, it assumes that the quantities of items bought in the 
nonbase period are the same as those in the base period. If the base period and 
the nonbase period are close together, this may not pose a very serious problem. 
If, however, the time interval between the base period and the nonbase period is 
great, the assumption may be unrealistic, and the index may lose much of its 
usefulness. 


Weighted 
Aggregative Index, 
Nonbase-Period 
Weights (Paasche's 
Index) 


Suppose that instead of using base-period weights, we use weights that reflect the 
quantities of the index items bought during the nonbase period. If these weights 
are denoted by Q n , the weighted aggregative index is given by 


2P*Q» \ 

ZPoQj 


ioo 


( 13 . 8 . 3 ) 


TABLE 13.8.2 
Unit prices, 1973 
and 1983, and 
average weekly 
quantities bought 
in 1973, selected 
food commodities 


Food 

commodity 

Unit price 

1973 

P 73 

1983 

PS3 

Quantity 

1973 

073 

^73<273 

^83 Q 73 

Meat 

$0.90 

$1.10 

2 

$1.80 

$2.20 

Milk 

0.29 

0.35 

3 

0.87 

1.05 

Eggs 

0.45 

0.65 

1 

0.45 

0.65 

Bread 

0.28 

0.40 

3 

0.84 

1.20 

Coffee 

0.85 

1.15 

1 

0.85 

1.15 

Total 

$2.77 

$3.65 


$4.81 

$6.25 



Weighted 

Arithmetic Mean of 
Relatives Index 


The use of current (nonbase period) weights in calculating the aggregative index 
was first proposed by Herman Paasche (1851-1925), another German economist. 
Therefore the index given by Formula 13.8.3 is known as the Paasche index. 

One disadvantage of the Paasche index is that we cannot make year-to-year 
comparisons of price changes when we use current-period weights. Another lim¬ 
itation is the need to obtain new weights for each current period of interest. The 
practical considerations relating to the gathering of data on current purchasing 
habits for each new time period are overwhelming. As a consequence, the Paasche 
index is not widely used. 

Just as we can compute aggregative indexes, so we can compute indexes based 
on averages. As with aggregative indexes, indexes computed by an averaging 
process may also be weighted or unweighted. The most useful and widely used 
index based on averages is a weighted index. Thus we shall limit our discussion 
to that index. 

Although we can use any method of averaging in computing an index, we 
ordinarily use the arithmetic mean. The index we are considering, then, is called 
the weighted arithmetic mean of relatives index. The weights used in this index 
are called values , since they reflect the values of the items consumed, produced, 
shipped, purchased, and so on. Table 13.8.2, for example, shows that the quantity 
of meat consumed is 2 units. The value for meat is obtained by multiplying the 
quantity by the unit price in 1973, $0.90, to obtain the 1973 value of $1.80. In 
general, then, value = price x quantity. 

Note also that the weights used need not be for the base period to which the 
prices pertain. It is desirable, however, that the weights be fixed—that is, not 
based on the current period—in order to allow for period-to-period comparisons. 

If the weights are denoted by W, the weighted arithmetic mean of relatives 
index is given by 




EXAMPLE 13.8.3 We can illustrate the construction of a weighted arithmetic mean 
of relatives index using the data from Example 13.8.1. Table 13.8.3 reproduces 
the data, along with some additional necessary calculations. Using Formula 13.8.4, 
we find the weighted arithmetic mean of relatives index for the present example 
to be 


$624.99 

$4.81 


129.9 


Note that this result is the same as the Laspeyres index obtained by Formula 
13.8.2. A little manipulation of Formula 13.8.4 shows that when the fixed weights 
of Formula 13.8.4 are base-period values, this formula reduces to Formula 13.8.2. 
Thus we interpret the results of the two methods in the same way. 









TABLE 13.8.3 
Data needed 
to calculate 
the weighted 
arithmetic mean of 
relatives index, 
Example 13.8.3 


Food 

commodity 

Price 

Price 

, relatives 
&)’“ 

Quantity 

1973 

O 73 

Weight 

W = P 73 Q 73 

. _.tt ..: 

Weighted 
price relatives 

(^' 100 )' v 

1973 

P 73 

1983 

P&3 

Meat 

$0.90 

$1.10 

122.2 

2 

$1.80 

$219.96 

Milk 

0.29 

0.35 

120.7 

3 

0.87 

105.01 

Eggs 

0.45 

0.65 

144.4 

1 

0.45 

64.98 

Bread 

0.28 

0.40 

142.9 

3 

0.84 

1 20.04 

Coffee 

0.85 

1.15 

135.3 

1 

0.85 

11 5.00 

Total 





$4.81 

_ __ ,-Si , . ■ 

$624.99 


There is a reason for having two methods of arriving at the same result. In 
certain situations, value weights may be more readily available than quantity 
weights. Or price data may be available in the form of relative rather than absolute 
values. In such cases, Formula 13.8.4 is better than Formula 13.8.2. 

The treatment of index numbers in this chapter is incomplete. However, this 
presentation of some key basic concepts should help you understand and appreciate 
the index numbers used in business. [For a more complete treatment of the topic, 
see the books by Crowe (1965), Fisher (1922), and Mudgett (1951).] 


Exercises 


13.8.1 The following prices and quantities reflect the average weekly buying habits of a 
typical family in 1978 and 1983. Construct the following: (a) price relatives for each item; 
(b) unweighted aggregative index for 1983 on 1978 as a base; (c) weighted aggregative 
index with base-period weights (Laspeyres index); (d) weighted aggregative index with 
nonbase-period weights (Paasche’s index); (e) weighted arithmetic mean of relatives index. 
Explain the meaning of each of the computed indexes. 



Item 

1978 

1983 

Unit price 

Quantity 

Unit price 

Quantity 

Apples 

$0.15 

2 

$0.25 

1 

Milk 

0.30 

2 

0.35 

2 

Bread 

0.30 

3 

0.40 

3 

Eggs 

0.50 

1 

0.65 

1 


13.8.2 A manufacturer uses four raw materials in the production of a product. The fol¬ 
lowing table shows the average prices and quantities of the four raw materials used in 
1978 and 1983. Construct the following: (a) price relatives for each raw material; (b) 
unweighted aggregative index for 1983 on 1978 as a base; (c) weighted aggregative index 
with base-period weights; (d) weighted aggregative index with nonbasc-period weights; 
(e) weighted arithmetic mean of relatives index. Explain the meaning of each of the 
computed indexes. 


Raw 

material 


1978 



1983 


Price 


Quantity 

Price 

Y 

Quantity 

A 

$10 


10 

$12 


15 

B 

3 


20 

5 


25 

C 

5 


50 

10 


60 

D 

2 


30 

5 


40 


Summary We based the first part of the discussion in this chapter on the classical concept 

that a typical time series has four components: trend, cycle, seasonal, and irregular 
variation. We defined each of these components, and explained methods for iso¬ 
lating the trend, seasonal, and cyclical components. The two objectives in time- 
series analysis are (1) to try to understand the past and present, and (2) to forecast 
the future, as far as possible. The calculations needed for time-series analysis are 
often tedious. We can avoid much of the drudgery of the calculations by using a 
computer. [For additional treatments of time-series analysis, see the books by 
Anderson (1971), Brown (1963), Davis (1941), and Newbury (1952).] 

Examples of time-series data may be found in such U.S. government publica¬ 
tions as Survey of Current Business and its weekly supplement, Business Statistics; 
Federal Reserve Bulletin, Economic Indicators; and Historical Statistics of the 
United States. Annual reports by business firms and local and state government 
agencies, as well as publications of various industrial organizations, also contain 
time-series data. Examples of the latter include the Annual Statistical Report of 
the American Iron and Steel Institute and Life Insurance Fact Book published by 
the Institute of Life Insurance. 

This chapter also covered the concepts and methods involved in the construction 
and interpretation of index numbers. It presented methods for calculating weighted 
and unweighted aggregative indexes and the weighted arithmetic mean of relatives 
index. 

In addition to the Consumer Price Index, discussed earlier, some other important 
indexes are Producer Price Indexes, published by the federal government, and 
the Federal Reserve Board’s Industrial Production. 

Review Questions 1. What is a time series? 

2. Why is time-series analysis of interest to the businessperson? 

3. List and briefly discuss the four components of time series. 

4. Explain Y = T S - C E. 

5. Explain the method of assigning year codes to the years in time-series data: (a) when 
the number of years is odd, (b) when the number of years is even. 

6. What are the disadvantages of the ratio-to-moving-averages method for measuring 
trend? 

7. Define the following terms: (a) moving average, (b) deseasonalized data, (c) cyclical 
relatives, (d) cyclical irregulars, (e) modified mean. 

8. Obtain some time-series data relevant to your area of interest and analyze them by as 
many of the analytical procedures of this chapter as seem appropriate. 

9. What is an index number? 

10. Define: (a) price index, (b) quantity index, (c) value index, (d) price relative, (e) 
simple index, (f) composite index. 

11. Explain in words the method of constructing the Laspeyres index. 

12. What are the disadvantages of the Laspeyres index? 

13. Explain in words the method of constructing the Paasche index. 

14. What are the disadvantages of the Paasche index? 






15. The following table gives the annual dividends per share as given in the annual report 
of a corporation, (a) Plot the raw data, (b) Compute and plot the least-squares trend line, 
(c) Compute y t for each year, (d) Compute a three-year moving average, (e) Compute and 
plot the cyclical relatives. 


Year 

Dividends 
per share 

Year 

Dividends 
per share 

1972 

1.62 

1979 

2.20 

1973 

1.70 

1980 

2.30 

1974 

1.75 

1981 

2.32 

1975 

1.85 

1982 

2.42 

1976 

2.00 

1983 

2.48 

1977 

2.10 

1984 

2.50 

1978 

2.10 




16. The following table shows th 
company between 1969 and 1983. 
squares trend line, (c) Compute y t 1 
(e) Compute and plot the cyclical 

Average 

ordinary 

Year policy size 

ie average ordinary life policy issued by an insurance 
(a) Plot the raw data, (b) Compute and plot the least- 
for each year, (d) Compute a five-year moving average, 
relatives. 

Average 

ordinary 

Year policy size 

1969 

$3650 

1977 

$5780 

1970 

3910 

1978 

6100 

1971 

4120 

1979 

6450 

1972 

4450 

1980 

6775 

1973 

4690 

1981 

7240 

1974 

4950 

1982 

7680 

1975 

5210 

1983 

8100 

1976 

5430 



17. The following table shows the percentage of its assets that a large firm invested in 

government securities for the years 1969 through 1983 

. (a) Plot the raw data, (b) Compute 


and plot the least-squares trend line, (c) Compute y t for each year, (d) Compute a three- 
year moving average, (e) Compute and plot the cyclical relatives. 


Year 

Percent of 
assets 

Year 

Percent of 
assets 

1969 

9.6 

1977 

5.8 

1970 

9.5 

1978 

5.4 

1971 

9.0 

1979 

5.1 

1972 

8.5 

1980 

4.7 

1973 

7.7 

1981 

4.4 

1974 

7.0 

1982 

4.5 

1975 

6.5 

1983 

4.3 

1976 

6.0 

! 



14. Elementary Survey 
Sampling 


Chapter Objectives: Sampling plays a crucial role in 
statistical-inference procedures. We have emphasized 
this point in previous chapters. In this chapter, you get 
a more extensive look into the topic of sampling. 

After studying this chapter and working the exercises, 
you should be able to do the following. 

1. Understand and appreciate the basic concepts and 
procedures used in a sample survey 

2. Draw the following kinds of samples: (a) stratified, 
(b) cluster, and (c) systematic 

3. Estimate means and totals from the data resulting 
from each of the three sampling plans 

4. Compute sample sizes for each of the three sam¬ 
pling plans 
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14.1 INTRODUCTION 


What qualities do the homemakers of a particular section of a city value most in 
a laundry detergent? One way to find out is to ask them. What proportion of items 
produced by a certain machine are defective? We can examine all the items pro¬ 
duced by the machine and keep a record of defective items found. What is the 
average age of adults living in a certain section of a city? We can interview each 
adult living in the area and ask for his or her age. 

Needless to say, these methods of getting answers to these questions are in 
most cases impractical. This is because if a large number of homemakers live in 
the area of interest, if the machine produces thousands of items, or if we want 
the age of several thousand adults, the cost of interviewing every person or ex¬ 
amining every item is usually prohibitive. Therefore we get answers to these 
questions by sampling. 

Sampling is not new or even recent in human experience. The common practice 
of taking a small portion of food or other material for tasting or testing undoubtedly 
precedes recorded history. However, until recent years, people have paid little 
attention to developing sampling methods that have desirable properties. The method 
of sampling doesn’t matter as long as the material or population being sampled is 
uniform, since, in this case, any kind of sample gives about the same result. When 
the material is not uniform, as is usually the case, the method by which we obtain 
the sample is crucial. Then the study of techniques to ensure a trustworthy sample 
becomes very important. 

One objective of sampling is to obtain estimates of one or more population 
characteristics— parameters. The “goodness” of a sample depends on how well 
it estimates the parameters of interest. The difference between an estimate and 
the true value of the parameter being estimated is called the sampling error. The 
sampling error for any given estimate is usually unknown, since the parameter 
being estimated is surely unknown, or else there would be no reason for sampling. 
We can get around this problem of an unknown sampling error to a large extent 
if we use probability sampling, since we can get estimates of the sampling error 
from the sample itself. If we use any method of selecting samples that does not 
involve probability sampling, there is no known way of estimating the sampling 
error. Therefore, this chapter deals entirely with probability sampling. 

Note that sampling as described in this chapter is only one way of obtaining 
useful information for decision making. Chapter 8 introduced another type of 
special study, designed experiments. Through sampling, we try to determine the 
status of an existing population or universe without affecting the units involved. 
In designed experiments, on the other hand, we carefully select units to be included 
in the study, determine control variables and their levels, and, usually, determine 
the different treatments (production techniques, advertising techniques, and so on) 
to be randomly assigned to the units. The greatest difference between the two 
types of studies is that in sampling the investigator is limited to the role of an 
observer. He or she exercises no control over the observations made or measure¬ 
ments taken. 



14.2 APPLICATIONS 


In recent years sampling techniques have been used more and more to obtain 
information in many areas, such as the following. 

1. Public opinion on the outcome of political elections before the elections; TV 
ratings, war issues, taxes, and so on 

2. Market research to determine consumer preferences and the effectiveness of a 
variety of advertising policies 

3. Quality-control procedures for manufacturing processes 

4. Accounting and auditing 

5. Forecasts of crop production 

6 . Determinations of the incidence and prevalence of specific diseases or condi¬ 
tions within a given geographic area—city, county, state, region, or nation, for 
example—through the National Health Survey 

7. Research relating to many social and economic problems 

8. Determinations of such population characteristics as employment status, in¬ 
come, and education 


14.3 BASIC THEORY 

We discussed the basic concepts of simple random sampling (both replacement 
and nonreplacement) in previous chapters. We have used statistics calculated from 
sample results to estimate several population parameters and the associated standard 
errors. We have shown how the central limit theorem lets us approximate the 
sampling distributions involved in the statistical inference procedures of construct¬ 
ing confidence intervals and testing hypotheses. In fact, we can apply all the 
inferential procedures described in this text to data obtained by sampling, using 
either the techniques in this chapter or simple random sampling as discussed in 
Chapter 5. 

We shall use the concepts, definitions, and theory of simple random sampling 
as the foundation for the additional concepts that you will need in order to un¬ 
derstand the sample designs to be covered in this chapter. 

14.4 ADDITIONAL CONCEPTS 

We have to expand the concepts and terminology used in basic sampling theory 
when we discuss sample surveys in general. This is because (1) in practice, we 
use random, nonreplacement sampling almost exclusively. (2) We must identify 
separately the units on which we perform the sampling operation and the units on 
which we take measurements. (3) We must take into account a variety of possible 
sample designs. 
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The following terms are important in the consideration of survey sampling. 

1. Observational unit. An observational unit is the unit (entity) on which some 
measurement is taken or some classificatory assignment is made. In Chapter 2 we 
referred to an observational unit as an entity. We use the term observational unit 
in this chapter, since it is more common. 

2. Sampling unit. The sampling unit is the unit on which the sampling operation 
is based. It may be the individual elements (observational units) of the universe, 
or it may be groups (clusters) of these individual elements. For example, the 
sampling unit may be a household instead of an individual. 

3. Universe (universe of inquiry or investigation). A universe is that set of 
elements whose characteristics we wish to study. If we can examine only a sample 
of elements, the universe is the set of elements about which we wish to generalize 
from the sample. 

4. Population. This term refers to a set of characteristics of a universe. If a 
universe is a certain set of individuals, the sets of ages and weights would be two 
different populations, respectively, for the single universe. Throughout the rest of 
this chapter, we shall use the terms population and universe interchangeably. 
However, if multiple measurements are involved, the ability to distinguish be¬ 
tween an observational unit and the measurement taken on it can be very impor¬ 
tant. 

5. Sample. A sample is a fraction or portion of the elements in a universe. The 
word is also used to refer to a part of a population. 

6. Sampling frame. A sampling frame is a list or other representation of the 
sampling units. A given universe may contain several different frames. 

7. Sampling fraction. This term indicates the proportion of the sampling frame 
included in the sample. 

8. Cluster. A cluster is a group of elements from a universe that can be treated 
as a single sampling unit for sampling purposes. 

9. Cluster sampling. Cluster sampling is a sampling method that uses groups or 
clusters of elements in the universe as sampling units. 

10. Stratified sampling. In stratified sampling, we divide the universe into strata 
(subuniverses) and select a sample independently from each stratum. 

11. Size of sample. This term refers to the number of sampling units included in 
the study. 

12. Probability sampling. This term refers to sampling plans in which the prob¬ 
abilities of selection are known and nonzero for every sampling unit in the uni¬ 
verse. 

13. Gap. The gap is the difference between the elements contained in the universe 
of interest and those contained in the sampling frame. The difference between the 
population of school-age children as shown on the school rolls (the universe) and 
children actually in school on a given day (the frame) is an example of a gap. 



14.5 STEPS INVOLVED IN A SAMPLE SURVEY 


A sample survey is a scientific study. Therefore we should plan and conduct it in 
a systematic manner. The main steps in a sample survey are: 

1. Statement of objectives. The statement of objectives must be specific. Without 
a clear statement of the questions to be answered, we can become so engrossed 
in the details of planning and conducting a complex survey that we make decisions 
that are not in accord with the true aims of the study. 

2. General considerations of survey design. This step deals with questions such 
as the following: Is the information needed already available, either within or 
outside the organization? Is a sampling study appropriate? How are measurements 
to be made? Are measurements both possible and practical? What degree of ac¬ 
curacy is required? What funds and other resources are available? 

3. Sample design. If we are using sampling methods, the design of the sample is 
extremely important. We must consider such things as choice of sampling unit, 
possibility of stratification, size of sample, methods of dealing with hard-to-get 
observations, and the cost of each operation. 

4. Making the determinations. The accuracy and usefulness of the results of any 
study are limited by the accuracy of the raw data. Results from a sample can be 
better than those obtained from surveying an entire population only if the raw 
data obtained by sampling are better. Therefore the method of making the deter¬ 
minations, the construction and use of questionnaires, the methods of training and 
controlling the investigator (the interviewer or the observer), and the detailed 
methods of dealing with hard-to-get determinations are all important to the validity 
of the results. 

5. Summary and analysis. Often, the expense and work involved in editing, cod¬ 
ing, key punching, tabulating, analyzing, and presenting data exceed the work 
and expense needed to obtain them. We should plan for the analysis and estimate 
its cost when we plan the survey—not after we have collected the data. Planning 
and conducting a sample survey are not to be taken lightly. We should realize 
that the theoretical problems are only one aspect of such an undertaking. 

This chapter presents three methods of sampling: stratified random sampling, 
cluster sampling, and systematic sa?npling. Since we covered simple random sam¬ 
pling, its methods, and the inferential procedures based on it in considerable detail 
in Chapters 5,6, and 7, we will not discuss them in this chapter. 

14.6 STRATIFIED RANDOM SAMPLING 

In simple random sampling, the observational unit is the sampling unit, and the 
sampling frame is the population of interest (if no gap exists). In stratified sam¬ 
pling, the observational unit is still the sampling unit. However, the population 
of interest is subdivided into subpopulations (strata) based on a known variable 
that is associated with the measurement to be made on the observational units. 
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Stratification is similar to blocking in experimental design. The units in a stra¬ 
tum should be more like each other (homogeneous) relative to the measurement 
of interest than they are like the units in other strata. If a population is to be 
effectively stratified, we must have information on the variable to be used for 
stratification purposes for every element in the entire population. In practice, then, 
we must be able to identify every element in the population, and we must be able 
to use the stratification variable to assign each element to its proper stratum. 

For example, if we are interested in the average income for a population of 
individuals, a representative sample might contain some laborers, some business 
persons, and some professional people. Simple random sampling would not ensure 
that each of these groups was represented. By stratifying the population into the 
three occupational groups and selecting some individuals from each group, we 
can assure the representation of each. The reason we thought of occupational 
groups in the first place is that we recognized an association between income and 
occupation. 

As another example, consider a study designed to estimate gross sales of a 
certain type of retail store. What known variable is associated with sales? We 
might use floor space or the number of employees per store as a basis for strati¬ 
fication. 

Effective stratification provides estimators that have smaller variances than 
estimators resulting from simple random sampling. This is because effective strat¬ 
ification produces subgroups that are more homogeneous than the original popu¬ 
lation. Other advantages that often result from stratified random sampling are 
reduced costs and greater administrative convenience. This occurs because we can 
usually locate the observational unit in stratified random sampling more easily 
than in simple random sampling. For example, the strata involved may be sub¬ 
divisions of a larger geographic area that is under study. Such an arrangement 
lets individual surveyors or teams of surveyors be assigned to a relatively small 
area. This enables a person to concentrate on a small area rather than the total 
geographic area of interest. In addition, smaller geographic areas mean shorter 
distances and reduced travel time for individual surveyors. 

The fundamental concept involved is the ability to improve the efficiency of 
sampling by using a known variable that is associated with the variable of interest 
(unknown) to form subpopulations, with the units in each subpopulation (stratum) 
being as alike as possible. We then draw a simple random sample from each 
stratum and pool the results to get an estimate of the parameter(s) of interest for 
the total population. 

In simple random sampling, we call the ith observation in the population Xj. In 
stratified sampling, for each observation, we use a label that denotes both the 
stratum containing the observation and the observation in the stratum that is being 
referred to. A typical observation is labeled x hi . This label indicates that we are 
referring to the ith observation in the hth stratum. 

We estimate the population total T by obtaining a separate estimate T h of each 
stratum total (this is the reason for obtaining a separate sample from each stratum). 
We then sum these estimates to get the estimated total for the entire population. 



In formula form, 

T st = ti + T 2 + * • * + T h + • • * + T L ( 14 . 6 . 1 ) 

where 


f st = estimated population total using stratified sampling 

L — number of strata 

T x = N l x l = the mean of the sample drawn from stratum 1 multiplied by 
the number of units in stratum 1 

T 2 = N 2 x 2 = the mean of the sample drawn from stratum 2 multiplied by 
the number of units in stratum 2 

T h = N h x h = the mean of the sample drawn from stratum h multiplied by 
the number of units in stratum h 

T l = N l x l = the mean of the last stratum multiplied by the number of units 
in the last stratum 

The form that we will normally use for calculation purposes is 

L 

t = + N 2 X 2 + • ♦ • + N h X h + * * * + N l X l = 2 N h X h ( 14 . 6 . 2 ) 

. . ... ., .. _ /?= 1 

The variance of T st is given by 


v(t) = i 


where = number of units in the hi h stratum 

n h ~ number of units in the sample drawn from the hx\\ stratum 
Si = [ 1 / (N h - 1)1 ( x hi ~ Ph) 2 ’ ^ at the variance of the units 
contained in the hth stratum 


The term (N h - n h )/N h in Equation 14.6.3 is called the finite population cor¬ 
rection factor (fpc). It is always appropriate when sampling is from a finite pop¬ 
ulation. The fpc as used here is comparable to the fpc discussed in Chapter 5, 
although the formulas are different. The fpc of Chapter 5 is appropriate when a 2 
is used for the population variance, whereas the fpc of this chapter is used when 
S 2 is used for the population variance. When N h is large relative to n h , n h /N h 
becomes small. The fpc then approaches unity and may be omitted. As noted 
before, a general rule states that we can omit the fpc when n h /N h < 0.05. 

Note that the variance of f h is 

.... .. V(T h ) = N h (N„ - n h ) ^ 04.6.4) 

n h 

We obtain V(T st ) by summing these variances over all strata. We estimate the 
mean for the total population by 













3 


X 


st 



(14.5.5) 


The variance of jc st is given by 


v&j = ~ V(TJ = 4 i N h (N h - n,,)^- 




(14.6.6) 


where N = ^ N h 

h = 1 


We obtain estimates of the variances given in Equations 14.6.3 and 14.6.6 by 
estimating each stratum variance and substituting these estimates for the actual 
stratum variances, as follows: 

A A L s 2 

V(TJ = 2 N h (N h - (14.6.7) 

h=\ n h 


where 


A | L s 2 
V(xJ = — 2 'Z N h (N h - n h \± 

N h= 1 tth 

1 " h 

S h — ~ r 2 ( x hi “ x hY 

n h ~ 1 / =! 


(14.6.8) 


We can construct interval estimates of the population mean fi and the population 
total T by applying the following general formula, introduced in Chapter 6: 

Estimate ± (reliability factor) X (standard error) 

In applying this formula here, we’ll assume that conditions are suitable for the 
application of normal theory. Thus we can use a value from the standard normal 
distribution as the reliability factor. 

The 100(1 - a)% confidence intervals for the population mean and total, 


respectively, are 




*st ± Z a/2W(* st ) 

(14.69) 

and 




Nx st ± z a/2 Vv(tj 

(14.6.10) 


When population variances are unknown, we use sample variances as estimators. 


EXAMPLE 14.6.1 The personnel director of a company wants to estimate the mean 
and total number of days employees were absent during the past year. The sam¬ 
pling frame consists of an alphabetically arranged card file containing one card 
for each employee. Since the personnel director has noted in the past that the 
number of absences appears to be related to length of time on the job, she decides 
to draw a sample of employees stratified on that basis. Three strata are used: under 
3 years, 3 through 9 years, and 10 years or longer. These strata are constructed 



TABLE 14.6.1 
Results of the 
stratified random 
sample of Example 
14.6.1 


Stratum 

"t, 

n h 

*h 

4 

Under 3 years 

500 

50 

2 

4 

3-9 years 

700 

70 

4 

5 

10 years or longer 

1,000 

100 

5 

6 

Total 

2,200 

220 




by sorting the cards according to length of employment. The cards are then sep¬ 
arated into the three length-of-employment categories. Table 14.6.1 shows the 
results. 

By Equation 14.6.2, we have 

T st = 500(2) + 700(4) + 1000(5) = 8800 

The estimated variance of 7 st , by Equation 14.6.7, is 

M v = (500)(500 - 5Q)(4) (700)(700 - 70)(5) 

( sl) 50 70 

+ (j OOOXIOpO - _ 100)(6) , 103 500 
100 

The mean and its variance, by Equations 14.6.5 and 14.6.8, respectively, are 

8800 , , . , 103,500 M 

x &t = —— = 4 and V(x st ) = = 0.02 

st 2200 st (2200) 2 


and V(x st ) 


st 2200 v st/ ( 2200) 2 
The 95% confidence interval for the population mean is 
4 ± 1.96VO02, 3.7, 4.3 

The 95% confidence interval for the population total is 

8800 ± 1.96V 103,500, 8169, 9431 


= 0.02 


Exercises 


14.6.1 Given the following results of a stratified random sample, find: (a) f st , (b) x sv 
(c) 1/(7^), and (d) V(* st ). Construct the 95% confidence interval for (e) /x and (f) T. 


Stratum 

N h 

nh 

*h 

Sh 

1 

100 

10 

35 

9 

2 

200 

20 

45 

10 

3 

300 

30 

60 

8 

Total 

600 

60 



14.6.2 

Given the following results of 

a stratified random sample, find: 

(a) f st , 

(c) V(f st ), and (d) KGJ. Construct the 95% confidence interval for 

(e) fi and 

Stratum 

N h 

nh 

*h 

.2 

s h 

1 

500 

50 

60 

10 

2 

700 

70 

75 

20 

3 

1,000 

100 

100 

35 

Total 

2,200 

220 
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14.6.3 A market-research firm wants information on the amount of money the families of 
a certain county spend for recreation. The firm decides that they should use a stratified 
random sample. Stratification is based on the four identifiable groups within the county: 
rural families, blue-collar families, middle-class suburban families, and upper-class sub¬ 
urban families. The variable of interest is the amount a household spent for recreation 
during the past year. The results of the survey are given in the following table. Construct 
95% confidence intervals for (i and T. 


Stratum 

N h 

nh 


si 

1. Rural 

200 

20 

75 

150 

2. Blue-collar 

300 

30 

150 

300 

3. Middle-class 

150 

15 

500 

1,000 

4. Upper-class 

50 

5 

900 

1,200 

Total 

700 

70 




14.6.4 A universe consists of 1000 trucking firms. We want to estimate the mean and 
total amount of money these firms spend on safety and insurance. Using tons hauled as a 
stratifying variable, we identify three strata and take a random sample from each stratum 
as follows. Compute: (a) f st , (b) T st , (c) V(f st ), and (d) V(3c st ). Construct the 95% 
confidence interval for (e) ji and (f) T. 


Stratum 

Nh 

nh 

Amount spent for safety and insurance 
by firms in sample (thousands of dollars) 

1 (small) 

500 

25 

12, 20, 15, 21, 12, 7, 26, 13, 23, 15, 1 8, 24, 
20, 24, 20, 25, 16, 6, 7, 17, 9, 19, 12, 21, 12 

2 (medium) 

300 

15 

23, 31, 30, 31, 34, 26, 25, 28, 30, 33, 29, 

19, 32, 41,37 

3 (large) 

200 

10 

85, 83, 68, 48, 37, 32, 36, 48, 64, 68 

Total 

1,000 

50 



14.7 CLUSTER SAMPLING 

As suggested by the definitions stated earlier, sampling units are either single 
observational units, such as individual people, or groups of observational units, 
such as households. Often it is difficult or impossible to identify the individual 
units making up the population of interest, as is required in constructing a sampling 
frame for either simple random sampling or stratified sampling. For example, if 
we wish to sample residents of a large city to obtain opinion data, we can be 
certain that no one can identify every individual in order to construct a sampling 
frame of individuals. However, the individual units may all be contained in geo¬ 
graphic areas, such as voting districts or city blocks. In this case, we use this set 
of units (voting districts or city blocks) as our sampling frame. 

Other groups that may prove helpful in sample surveys in which we wish 
information on elementary units include the following: 

Group Elementary unit 

household household member 

classroom student 

file cabinet file folder 

invoice item 





In cluster sampling, we select a sample of groups from the sampling frame and 
obtain information on all the individual elementary units in the groups selected. 
Such groups are called clusters. If, for each cluster selected in a sample, we retain 
in the sample all the individual units in that cluster, we refer to the procedure as 
simple cluster sampling. If, from each cluster selected, we draw a sample of 
individual units for inclusion in the sample, we call the procedure two-stage cluster 
sampling. 

Often we must use cluster sampling because we have no sampling frame for 
the observational units available and cannot construct one. When we have a choice 
of sampling plans, administrative convenience and economic considerations often 
present overriding arguments in favor of cluster sampling. In fact, we may have 
to choose a sampling plan from among plans utilizing different types of clusters. 
For example, when the observational unit is a household, the choice of clusters 
may be between city blocks and census tracts. 

In stratified sampling, we arranged (classified) the population into groups based 
on a known variable in order to improve the efficiency of the sampling. We then 
selected a simple random sample from each group (stratum). In order to deal with 
the theory for stratified sampling, we had to label (number) each individual in a 
stratum. Identifying an individual required knowledge of both labels (hi ), not just 
the single label (/) used in simple random sampling. Like stratified sampling, 
cluster sampling requires two labels to identify a given observational unit, one to 
identify the cluster containing the observational unit and the other to identify the 
proper unit within the cluster ( xy). We should stress that we need this double 
labeling to provide a way of thinking about clusters that lends itself to the devel¬ 
opment of the necessary theory. In practice we normally do not know how many 
observational units are in a cluster unless we select it in the sample. For example, 
in obtaining information from residents of a given city block selected in a sample, 
we would find out how many people live on that block, but we would not bother 
to find out how many people live on blocks not in the sample. 

In cluster sampling, then, we describe the population as being composed of M 
clusters, with the it h cluster containing observational units. Using this new 
labeling, we note 

M 

N = 2 Ni (even though the N/s are unknown) ( 14 . 7 . 1 ) 

1 = 1 

M Ni 

T = 2 2 X H t14.-7.2) 

1=17=1 

(We add the values of observational units to obtain cluster totals and then add 
cluster totals.) 


M55 

II 

=1 

i M Ni 

-*2 2% 

N i— 1 j = 1 

(14.7.3) 


Ni 


T,. 

= 2 x,j 

(14.7.4) 


7=1 
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(We obtain the total of the ith cluster by adding the values of all observational 
units in the ith cluster.) 

In simple cluster sampling, we are dealing with clusters and cluster totals. We 
select m out of M clusters as the sample. For each cluster selected, we use all 
observational units in the cluster. If we draw repeated samples, the variation in 
sample results will be due solely to which clusters we select. At this point, there¬ 
fore, it is useful to define the variance of cluster totals. 

We noted above (Equation 14.7.4) that a cluster total is the value we obtain by 
adding the values of all observational units in a cluster. The variance of cluster 
totals is given by 

| M 

sl = 7 (T ‘- ~ f - - )2 (,4J ' 5) 

where T. . is the average of the cluster totals. The subscript b indicates that the 
variance is the variance between clusters. 

Suppose that we want to estimate the total consumption T of a competitor’s 
product in a large city. We take a sample of m city blocks and question all persons 
living on each block about the amount of the competitor’s product used during a 
specified time period. We estimate the population total T as follows: 


1. We use Equation 14.7.4 to obtain the total consumption of persons living in 
each block selected in our sample. 

2. We obtain the total for the sample by adding the cluster totals (T l ) for all 
clusters in the sample (Equation 14.7.2). 

3. We estimate the average cluster total by dividing the sample total by the number 
of clusters in the sample. That is, 


m Ni 


T - ■ = ~ = ~ 2 X*# 

m i m /= i j= i 


4. Finally, we estimate the population total by multiplying the estimated average 
cluster total by the number of clusters in the entire population. That is, 




m ~i 


M 


m Ni 


m /= i 


We obtain the variance of the estimator from the following: 


V(ti) = M 2 


M — m 
M 


Sl 


(14.7.6) 


Since S 2 h is usually unknown, we must estimate it from the sample, as follows: 


si = 


m — 1 


2 ( T, 


T. .f 


(14.7.7) 





so that 


TABLE 14.7.1 
Ages of sales 
personnel in 
10 quick-food 
establishments 


Wo i) = M 2 


M — m 
M 


m 


(14.7.8) 


We can use these results as the basis for estimating the population mean. How¬ 
ever, calculations of variances are unwieldy unless all clusters contain approxi¬ 
mately the same number of units. In the case of equal cluster sizes, 


and 


N [ = N = — 
1 M 


~ = Ic\ = 
cl N MN 


for all clusters 

Ni 

7=1 



(14.7.9) 


V(*cl) 



M 2 / 

M — 

1 si 

1 (M - m\ 

3 

M 2 N 2 ' 

v M ) 

1 

m 

~ N 2 \ m ) 

m 


(14.7.10) 


In general, we must estimate both Si and N from the sample results. The 
100(1 - a)% confidence intervals for the population mean and total, respectively, 
are given by 

*cl ± Z c/2 VV(3 cl ) (14.7.11) 

and 

Tc\ ± Z a/2 VV(f cl ) (14.7.12) 

We use sample values to estimate unknown quantities. 


EXAMPLE 14.7.1 The manager of a chain of 300 quick-food establishments would 
like to know the average age of the sales personnel. Each establishment employs 
about 6 salespersons. The manager selects a random sample of 10 clusters (estab¬ 
lishments) and determines the ages of the employees in these 10 establishments. 
Table 14.7.1 shows the results. 

By Equation 14.7.9, the estimate of the mean age of the sales personnel is 


Establishment 


i 

2 

3 

4 

5 

6 

7 

8 

9 

10 

16 

21 

18 

15 

16 

19 

19 

16 

19 

21 

18 

20 

18 

17 

21 

15 

15 

15 

20 

16 

17 

18 

18 

20 

17 

18 

20 

16 

16 

20 

19 

16 

18 

20 

19 

18 

17 

21 

22 

21 

16 

19 

17 

21 

19 

16 

19 

24 

16 

21 

15 

15 

15 

18 

17 

18 

15 

17 

20 

16 

101 

109 

104 

111 

109 

104 

105 

109 

113 

115 


Total 


RiihH.'.WiilillllliSii I MHBti IW ill ti n<^XI gMBftilEftJ'fetU nil mill KMi fflilil’UBfBIHIHif MsHItWiSailfliBiHB 




101 + 109 + • • • + 115 = 1080 
10 (6) ~ 60 


18 


In order to construct a confidence interval, we must first compute an estimate of 
the between-cluster variance. Since T. . = 1080/10 = 108, by Equation 14.7.7 
we have 


4 = 


(101 - 108) 2 + (109 - 108) 2 + • ■ • + (115 - 108) : 


10 


1 


= 19.56 


We can now compute, by substituting our estimate of Si into Equation 14.7.10, 


l/300 - 10\ 19.56 


3 - 55 —/ ^ 0 0525 

The 95% confidence interval for the mean is 

18 ± 1.96 V0.0525, 17.55, 18.45 


EXAMPLE 14.7.2 The chief accountant of a chain of 100 variety stores wishes to 
estimate the total dollar value of bad checks received from customers during the 
week before Christmas. Table 14.7.2 shows the information from a random sample 
of 10 stores (clusters). 

We first find 


f = $ 1110 

10 


$111 


from which we compute our estimate of the total, 

f cl = 100 ($ 111 ) = $ 11,100 

To construct a confidence interval for the population total T, we first compute 

+ (120 - 111) 2 


, (125 - 111) 2 + (100 - 111) 2 + • 

5f> “ (10 - 1 ) 

By Equation 14.7.8, we may now compute 


= 132.22 


V(T e \) = 1002 



132.22 

10 


118,998 


The 95% confidence interval for T, then, is 


TABLE 14.7.2 
Dollar value of bad 
checks received 
from customers. 
Example 14.7.2 


$11,100 ± 1.96 VI 18,998 
$11,100 ± 676, $10,424; $11,776 


Store 1 23456789 10 

Total amount, $ 125 100 130 95 110 105 100 110 115 120 Total: 1110 




Exercises 


There is one big distinction between cluster sampling and stratified random 
sampling. The main goal when we use stratification is to reduce sampling error. 
The result, when we realize the objective, is a narrower confidence-interval esti¬ 
mate of estimated parameters. By contrast, we usually use cluster sampling to 
reduce costs and increase administrative efficiency. For a given sample size, clus¬ 
ter sampling usually results in a less precise estimate (wider confidence interval) 
than do simple and stratified random sampling. 

14.7.1 Given: M = 25, N t = N = N/M for all clusters, m = 5. 

The following observations are obtained. Find: (a) 7^,, (b) V(r cl ), (c) I d , (d) V(Jc cl ). 
Construct the 95% confidence interval for (e) T and (f) /x. 


m 



Sampled cluster 


Xij 


1 5, 3, 2, 5, 6, 3, 4, 4, 3, 5 

2 5, 9, 9, 10, 2, 2, 3, 4, 2, 1 

3 3, 5, 4, 6, 3, 2, 4, 3, 7, 6 

4 3, 3, 4, 3, 2, 1, 4, 7, 9, 8 

5 8, 1, 3, 4, 5, 7, 2, 4, 5. 8 

14.7.2 In a study of radio advertising rates in a certain large area, an advertising agency 
divides the area into 20 smaller areas (clusters), each of which contains about 10 radio 
stations. The agency selects a sample of 6 clusters, and obtains the unit charges for a 
certain type of spot announcement for each station in each cluster. The following data are 
collected. Find: (a) x d and (b) V(3c cl ). Construct the 95% confidence interval for /x. 

Sampled cluster number 

1 2 3 4 5 6 


$2.50 

$1.50 

$4.40 

3.50 

1.50 

6.00 

4.00 

2.00 

2.40 

5.00 

2.00 

3.00 

7.00 

2.00 

4.00 

7.00 

2.25 

3.00 

12.00 

2.50 

5.00 

3.00 

3.50 

9.00 

3.00 

7.50 

9.00 

3.00 

10.00 

10.00 

14.8 

SYSTEMATIC 

SAMPLING 


$3.50 

$1.50 

$4.00 

3.60 

3.10 

3.40 

3.65 

4.00 

6.00 

4.00 

4.40 

5.00 

5.00 

5.25 

4.40 

7.50 

3.00 

6.50 

9.00 

3.75 

7.00 

4.00 

6.00 

5.30 

3.50 

7.00 

12.50 

15.00 

12.00 

9.25 


s 


m 

g 


Suppose that the head of a department store wants to obtain a rapid estimate of 
gross sales by checking 2% of the sales slips each week. The sales slips are filed 
chronologically by cash register each day. Developing a sampling frame and 
selecting a sample by any of the techniques discussed so far would be time- 
consuming and costly. An alternative way of selecting a sample in this situation 
is called systematic sampling. Since we want a sample of 1 out of every 50 sales 
slips, we randomly select 1 of the first 50 sales slips, say number 7, and every 
fiftieth sales slip after that (57, 107, . . .) until we cover the entire population. 


i 

m 

m 


H 

m 

8 


i 

n 










In general, if we want a sample that is 1/M of the total population, we select 
one of the first M items and every Mth item thereafter. 

When we contemplate systematic sampling, we consider the nature of the pop¬ 
ulation to be sampled. We can identify the following types. 

1. A random population is one in which the sampling units as represented by the 
frame are in random order. 

2. A periodic population is one in which the sampling units as represented by 
the frame exhibit some type of cyclical variation. 

3. An ordered population is one in which the sampling units as represented by 
the frame are ordered according to magnitude. 

When we draw a single systematic sample from a random population, the 
formulas for the estimators, their variances, and the corresponding confidence 
intervals are the same as for simple random sampling. 

When we sample from an ordered or periodic population, the formulas for 
simple random sampling do not give satisfactory estimates. We may avoid the 
problem by the technique of repeated systematic sampling, by which we draw 
more than one systematic sample. 

Suppose, for example, that we have a population of N = 1600 sampling units, 
from which we wish to select a sample of size n — 80. Suppose that we know 
that the population is in random order. We can draw a single systematic sample 
by selecting a random starting point from the first M = 1600/80 = 20 items, then 
drawing every twentieth item thereafter. If we suspect that the population is not 
in random order, we draw more than one systematic sample. Suppose that we 
want to draw m - 10 systematic samples. Since we want a total sample size of 
n = 80, each of the 10 systematic samples must contain 8 observations. We also 
know that under these conditions there are a total of M = 1600/8 = 200 possible 
systematic samples from which to draw the m = 10 we want. We select 10 
random numbers between 1 and 200 to provide the starting points for our 10 
systematic samples. To each of these starting points we add the constant 200 until 
we have 10 samples of size 8. Suppose that the 10 random starting points are 74, 
37, 91, 18, 120, 162, 27, 176, 24, and 145. Table 14.8.1 shows the random 


TABLE 14.8.1 

Random numbers 

Sample 

number 

Random 
starting point 

Second element 
in sample 

Third element 
in sample 


Eighth element 
in sample 

for selecting 10 

1 

18 

218 

418 


1,418 

systematic samples 

2 

24 

224 

424 


1,424 

of size 8 

3 

27 

227 

427 


1,427 


4 

37 

237 

437 


1,437 


5 

74 

274 

474 


1,474 


6 

91 

291 

491 


1,491 


7 

120 

320 

520 


1,520 


8 

145 

345 

545 


1,545 


9 

162 

362 

562 


1,562 


10 

176 

376 

576 


1,576 






numbers designating which sampling units we would select in each of the 10 
systematic samples. 

If we think of the M — 200 possible systematic samples of size 8 as clusters 
in the population and the m = 10 systematic samples as sample clusters, we may 
use the formulas of Section 14.7 to obtain estimates of the parameters, their 
variances, and the corresponding confidence intervals. 

EXAMPLE 14,8.1 A firm employing 200 persons wishes to estimate the average 
age of its employees. Each employee’s date of birth is recorded in the personnel 
record. These records are filed alphabetically by department in a filing cabinet. 
Table 14.8.2 shows the 200 employees’ ages as they would be found in the filing 
cabinet. 


TABLE 14.8.2 

001. 

23 

041. 

59 

081. 

65 

121. 

47 

161. 

62 

Ages of 200 

002. 

38 

042. 

58 

082. 

42 

122. 

64 

162. 

17 

employees 

003. 

43 

043. 

65 

083. 

50 

123. 

55 

163. 

37 


004. 

49 

044. 

50 

084. 

44 

124. 

50 

164. 

23 


005. 

36 

045. 

46 

085. 

54 

125. 

65 

165. 

36 


006. 

43 

046. 

61 

086. 

17 

126. 

53 

166. 

43 


007. 

61 

047. 

30 

087. 

49 

127. 

32 

167. 

30 


008. 

31 

048. 

55 

088. 

38 

128. 

44 

168. 

41 


009. 

57 

049. 

26 

089. 

22 

129. 

38 

169. 

59 


010. 

61 

050. 

26 

090. 

56 

130. 

37 

170. 

63 


011. 

25 

051. 

62 

091, 

46 

131. 

24 

171. 

35 


012. 

64 

052. 

25 

092. 

27 

132. 

44 

172. 

32 


013. 

60 

053. 

28 

093. 

64 

133. 

27 

173. 

32 


014. 

37 

054. 

34 

094. 

19 

134. 

40 

174. 

31 


015. 

47 

055. 

53 

095. 

64 

135. 

43 

175. 

35 


016. 

38 

056. 

25 

096. 

46 

136. 

45 

176. 

22 


017. 

32 

057. 

33 

097. 

59 

137. 

33 

177. 

58 


018. 

56 

058. 

25 

098. 

60 

138. 

22 

178. 

31 


019. 

49 

059. 

25 

099. 

46 

139. 

62 

179. 

23 


020. 

43 

060. 

25 

100. 

27 

140. 

42 

180. 

29 


021. 

48 

061. 

34 

101. 

59 

141. 

51 

181. 

43 


022. 

38 

062. 

19 

102. 

43 

142. 

49 

182. 

17 


023. 

47 

063. 

23 

103. 

50 

143. 

31 

183. 

53 


024. 

47 

064. 

42 

104. 

51 

144. 

64 

184. 

64 


025. 

57 

065. 

48 

105. 

18 

145. 

36 

185. 

38 


026. 

54 

066. 

37 

106. 

59 

146. 

59 

186. 

36 


027. 

31 

067. 

26 

107. 

33 

147. 

54 

187. 

59 


028. 

36 

068. 

55 

108. 

60 

148. 

24 

188. 

48 


029. 

54 

069. 

36 

109. 

26 

149. 

60 

189. 

26 


030. 

31 

070. 

65 

110. 

18 

150. 

65 

190. 

59 


031. 

57 

071. 

63 

111. 

25 

151. 

36 

191. 

29 


032. 

35 

072. 

48 

112. 

18 

152. 

25 

192. 

23 


033. 

24 

073. 

50 

113. 

20 

153. 

25 

193. 

22 


034. 

62 

074. 

49 

114. 

19 

154. 

56 

194. 

63 


035. 

44 

075. 

17 

115. 

31 

155. 

51 

195. 

48 


036. 

32 

076. 

31 

116. 

56 

156. 

53 

196. 

57 


037. 

30 

077. 

26 

117. 

37 

157. 

40 

197. 

25 


038. 

33 

078. 

23 

118. 

49 

158. 

33 

198. 

50 


039. 

50 

079. 

63 

119. 

55 

159. 

26 

199. 

60 


040. 

62 

080. 

37 

120. 

57 

160. 

42 

200. 

28 
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The firm decides to take a 10% sample, which will result in a total sample size 
of 20. It also decides to use systematic sampling. It chooses four systematic 
samples of size 5. Since 200/5 = 40 possible systematic samples of size 5 can 
be drawn from a population of size 200, the firm needs 4 random starting points 
between 1 and 40. Using a table of random numbers, it selects the following 
starting points: 10, 37, 8, and 12. The samples consist of the elements shown in 
Table 14.8.3. 

Our estimate of the population mean, by Equation 14.7.9, is 
243 + 158 + 209 + 192 


In order to construct a confidence interval for fx, we first compute 


T. . 


243 + 158 + 209 + 192 
4 


200.5 


By Equation 14.7.7 we compute 

2 _ (243 - 200.5) 2 + (158 - 200.5) 2 + • ■ • + (192 - 200.5) 2 

Su ~ • *-—~ 11 - 

(4-1) 

= 1252.33 


and by Equation 14.7.10, 


fact) 


1 / 40 — 4 \ 1252.33 
(5) 2 \ 40/4 


11.27 


The 95% confidence interval for fi , then, is 

40.1 ± 1.96 VTU27 
40.1 ± 6.6, 33.5, 46.7 


Because we can draw a systematic sample with relative ease and efficiency, it 
is an attractive alternative to other sampling procedures when there is a readily 
available sampling frame. Although other sampling procedures require us to ac¬ 
tually assign a number to each sampling unit, systematic sampling requires only 
that we count sampling units. For example, a systematic sampling plan may call 
for the selection of the 7th, the 27th, the 47th, . . . sampling unit, and so on, 


TABLE 14.8.3 

Four systematic 1 

2 

3 

4 

samples of size 5 File # Age 

File # Age 

File # Age 

Fite # Age 

selected from 10 61 

Table 14.8.2 50 26 

90 56 

130 37 

170 63 

Total 243 

37 30 

77 26 

117 37 

157 40 

197 25 

158 

8 31 

48 55 

88 38 

128 44 

168 41 

209 

12 64 

52 25 

92 27 

132 44 

172 32 

192 




Exercises 


until the entire population has been covered. To select a sample in this manner, 
we do not even need to know the population size. However, sampling units must 
actually be available for counting. In practice, systematic sampling is widely used 
as an alternative to simple random sampling because of its convenience and sim¬ 
plicity. As a rule, we analyze the systematic sample as though it were a simple 
random sample. As we have indicated, for this approach to be valid, the sampling 
units in the frame must be in random order. The investigator, therefore, should 
use caution when following this approach, since it is often hard to find out for 
certain that the sampling units in a frame are indeed in random order. In general, 
the best practice is to draw several systematic samples, as illustrated in Example 

14.8.1. 

14.8.1 Using a table of random numbers, select four new systematic samples from Table 

14.8.2. Construct a 95% confidence interval for fx, using the results of the new samples. 

14.8.2 Suppose that you want to take a 15% sample. How many systematic samples would 
you suggest taking? Randomly select the number of starting points indicated by your answer 
and use the sample data to construct a 95% confidence interval for (x . 
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14.9 COSTS, EFFICIENCY, AND SAMPLE SIZE 


Up to this point, we have disregarded costs and the efficiency of sampling pro¬ 
cedures. Both are important practical considerations. If we are to use sampling to 
obtain needed data, we need at least rough estimates of the costs of sampling and 
the quality of the estimate that we will obtain. The desired quality of the estimate 
dictates the sample size, which, in turn, determines costs. For the types of sam¬ 
pling we have discussed, we can break the total cost C for a sampling study into 
two basic components: one part that is a function of the number of observational 
units , C u , and another part, fixed cost, that is relatively independent of the number 
of observational units, C f . In stratified sampling, the unit cost may vary consid¬ 
erably from one stratum to another. For example, one stratum may be a rural area 
and a second stratum an urban area. When there is no cost differential, we can 
use the formula for simple random sampling. 

Cost formulas for the different sample designs are as follows: 


Simple random sampling: C 
Stratified sampling: C 

Cluster sampling: C 

Systematic sampling: C 


nC u 

(14.9.1) 

L 

S n h C UH 

h= 1 

(14.9.2) 

m 

5>A f 

i= 1 

(14.9.3) 

m 

2 >a, 

i=1 

(14.9.4) 
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The efficiency of a sample design is associated directly with the variance of the 
estimator involved. In choosing between alternative sample designs—for example, 










Simple Random 
Sampling 


Stratified Sampling 


simple random sampling and cluster sampling—the design that has the smaller 
variance for its estimator for the same cost would be considered the better design. 

When we have a fixed budget for a sampling study, we calculate the sample 
size in the following manner: 


n 


m 


n 


C - C f 

C„ 

C_- £f 

NC U 

l 


(simple random sampling) 


(cluster sampling and systematic sampling) 



(stratified sampling—if all C U) are constant) 


(14.9.5) 


(14.9.6) 


(14.9.7) 


See the references at the end of this chapter for those cases in which stratification 
is used and not all C u are equal. 

In situations in which the quality of the estimator obtained, rather than the cost, 
is the main consideration, we determine sample size as described in the next 
sections. 


In simple random sampling, the formula for n is given by 

Nz 2 S 2 

n ~ Nd 2 + z 2 S 2 


(14.9.8) 


where d = maximum desirable sampling error, z = value of normal deviate as 
determined by portion of time d can be exceeded, S 2 = population variance, and 
N = size of population. When S 2 is unknown, we use its estimator s 2 , computed 
from a pilot sample, instead. 

Equation 14.9.8 differs from the formula for n given in Chapter 6. Equation 
14.9.8 incorporates the finite population correction factor. In Chapter 6 we as¬ 
sumed that n/N < 0.05 and that therefore we could use a simpler formula. Since 
Equation 14.9.8 uses S 2 rather than cr 2 , it also differs from Equation 6.9.4. 


In stratified sampling, the total sample size depends not only on the nature of the 
population being sampled, but also on the way in which we distribute (allocate) 
the observational units among the various strata. There are a number of ways to 
allocate a given total sample size to the different strata. Which of these is the best 
method depends on the population being sampled. The determination of total 
sample size and allocation of sample sizes to strata are shown here for the four 
most commonly used allocation methods. 

I. Equal size subsamples selected from each stratum: 


(allocation formula) 


(14.9.9) 


n = 


N 2 d 2 + z 2 V h=l N h Sl 


(total-sample-size formula) 


(14.9.10) 



Cluster and 

Systematic 

Sampling 


2. Subsamples allocated to the strata in proportion to the stratum sizes: 


n h - — 5 n (allocation formula) 


(14.9.11) 


\N h Sl 

N 2 d 2 + z r lk^ \N h Sl 


(total-sample-size formula) ( 14 . 9 . 12 ) 


3. Optimal allocation, which allows for variation in both cost and variance across 
strata: 


wvc: 


n (allocation formula) 


iv¥ + z 2 2L ,yv*sl 


(14.9.13) 


(14.9.14) 


(total-sample-size formula) 

4. Neyman allocation, which provides only for variability in variances across 
strata; all C u assumed equal for all strata: 


n h ~ ^ 1 —'-n (allocation formula) 


(14.9.15) 


N 2 d 2 + z 2 li = ,N„Sl 


(total-sample-size formula) 04 . 9 . 1 6) 


In practice, we substitute s\, sample estimates of stratum variances, for S\ in these 
formulas, since we usually do not know the population variances. 

In our treatment of cluster sampling, we have limited the discussion to studies in 
which we want to estimate population totals and, in the case of clusters approx¬ 
imately equal in size, population means. Systematic sampling qualifies as a special 
type of cluster sampling in which it is appropriate to estimate either totals or 
means. For these situations, we determine sample size in terms of number of 
clusters (or starting points) as follows: 


Mz 2 S\ 

Md^ + z 2 Sl 


(14.9.17) 


where d cl - Nd = maximum desirable sampling error in estimating the average 
cluster total. 


14.9.1 Compute the total cost of conducting a sample survey under the following condi¬ 
tions: (a) Simple random sampling, where fixed costs are $1000, each sampling unit costs 
$10 to obtain, and a sample of size 150 is required, (b) Stratified sampling, with fixed 
costs of $1500 and information regarding strata as follows. 
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Stratum 


Cost per unit 


(c) Cluster sampling, with $1200 fixed costs, in which we select 5 clusters, each containing 
10 sampling units and with costs as follows. 

-*- -- -- - ---—- £ ----- 

Cluster 1 2 3 4 5 

Cost per unit, $ 3 7 8 5 5 

(d) Systematic sampling, with fixed costs of $500, in which we select 5 random starting 
points, each of the 5 samples contains 20 sampling units, and the costs per unit are as 
follows. 


Systematic sample 1 2 3 4 5 

Cost per unit, $ 10 10 10 5 5 

14.9.2 Determine what size sample can be taken under each of the following circum¬ 
stances: (a) Simple random sampling: (1) total budget = $3500; (2) fixed costs = $1000; 
(3) cost per sampling unit - $15. (b) Cluster sampling: (1) total budget = $5000; (2) 
fixed costs = $1500; (3) cost per observational unit = $5; (4) number of observational 
units per cluster = 7. (c) Stratified random sampling: (1) total budget = $3500; (2) fixed 
costs = $1500; (3) C Uh and Si equal for all strata; (4) cost per sampling unit = $10. 

14.9.3 Determine the sample size to be taken under the following circumstances: (a) 
Simple random sampling: (1) a 95% confidence interval for fx is desired; (2) a precision 
of 2.5 units is desired; (3) an estimate of S 2 from a previous study is s 2 = 100; (4) N = 
5000. (b) Cluster sampling (determine m)\ (1 ) M = 30; (2) d cl = 5; (3) a 95% confidence 
interval for jx is desired; (4) an estimate of S 2 b is 100. 

14.9.4 Determine the sample size needed when we use stratified sampling and the method 
of allocation is as indicated: (a) Equal allocation: (1) a 95% confidence interval for /x is 
desired; (2) desired precision = 3.0; (3) the following data are available. 


Stratum 

N h 

SI 


5 

250 Total: 925 
275 


(b) Proportional allocation: (1) a 95% confidence interval for fx is desired; (2) desired 
precision = 3; (3) the following information is available. 


Stratum 

Nh 


Total: 1000 


(c) Optimal allocation: (1) a 95% confidence interval for fx is required; (2) desired preci¬ 
sion = : 5; (3) the following information is available. 


Stratum 1 2 3 

N h 400 700 900 Total: 2000 

C„ 2 4 6 

SI 200 800 3100 

(d) Neyman allocation: (1) a 95% confidence interval for fx is desired; (2) desired preci¬ 
sion = 5; (3) the following information is available. 



Summary 


Stratum 1 2 3 

N h 400 700 900 Total: 2000 

SI 200 800 3100 

14.9.5 Using the values of n found in steps (a), (b), (c), and (d) of Exercise 14.9.4, 
allocate the sample to the strata by the equal, proportional, optimal, and Neyman methods, 
respectively. 


14.10 NONPROBABILITY SAMPLING PROCEDURES 

Nonprobability samples include judgment samples, quota samples, and conven¬ 
ience samples. When the subjective judgment of the sampler determines which 
items make up the sample, we call the result a judgment sample. Suppose that a 
market-research team wishes to use a judgment sample as the basis for making 
inferences about the buying habits of families living in a certain town. The re¬ 
searchers select what, in their judgment, is a sample of families representative of 
all families in the town. Another team of researchers would, no doubt, select a 
different judgment sample. It is unlikely that any two teams would ever agree on 
what constitutes a representative sample. 

A quota sample is selected on the basis of more specific guidelines about which 
items should be drawn. The quota sampler must know, for the population of 
interest, the proportion of the items with certain characteristics. Suppose, for 
example, that a real estate appraiser wants to estimate the mean value of the 
houses in a population of single-family dwellings. If quota sampling is to be used, 
the appraiser may need to know, for the population, what proportion of houses 
are two-story, what proportion are split-level, what proportion have central air 
conditioning, and what proportion have swimming pools. A quota sample should 
contain dwellings with these characteristics in the same proportions as the popu¬ 
lation. Within each category, the appraiser, who must know which houses in the 
population have these characteristics, can use subjective judgment in deciding 
which ones to include in the sample. 

As the name implies, convenience samples are used because of their conven¬ 
ience. A market researcher who wishes to draw a sample of families living in 
some subdivision might find it convenient to select those families living on the 
main street of the subdivision. Although convenience samples may serve some 
specialized purpose, we cannot, in general, depend on them for making inferences 
about populations. 

Nonprobability samples do not have the objectivity of selection that is an es¬ 
sential characteristic of probability samples. Unlike probability samples, non¬ 
probability samples do not yield estimates to which we can attach statements of 
confidence. This is why probability samples rather than nonprobability samples 
are the primary focus of attention in this text. 

Probability sampling is one of the most useful management tools. It is being used 
increasingly to obtain information needed for decision making. We have discussed 
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only the basic techniques and fundamental concepts of probability sampling. You 
should be aware that you may need far more sophisticated sample designs and 
estimation techniques to provide the information you need in many situations. 
[For these cases, see the sampling textbooks available, such as those by Cochran 
(1977), Cyert and Davidson (1962), Deming (1960, 1966), Hansen, Hurwitz, and 
Madow (1953), Kish (1965), Schaeffer, Mendenhall, and Ott (1979), Raj (1968), 
Sampford (1962), Sukhatme and Sukhatme (1970), Yamane (1967), and Yates 
(I960).] 

The discussion in this chapter was limited to the estimation of population means 
and totals. We are often interested in estimating other parameters from the data 
of stratified, cluster, and systematic samples. The population proportion, for ex¬ 
ample, is of considerable interest. Although, in the interest of space, this chapter 
did not cover the procedures for estimating population proportions, the techniques 
are logical extensions of those we use for estimating population means. See the 
above references for a detailed discussion of the procedures. 

1. Define each of the following items: (a) observational unit, (b) sampling unit, (c) 
universe, (d) population, (e) sample, (f) sampling frame, (g) sampling fraction, (h) 
cluster, (i) cluster sampling, (j) stratified sampling, (k) variance, (1) variance of an 
estimator, (m) probability sampling, (n) gap. 

2. List and discuss the steps involved in a sample survey. 

3. Explain why the formulas used in cluster sampling are also applicable in systematic 
sampling. 

4. What considerations dictate the use of stratified random sampling rather than simple 
random sampling? 

5. When is it advantageous to use cluster sampling? 

6. When is the use of systematic sampling desirable? 

7. Describe a business-oriented situation in which stratified random sampling would be 
appropriate. Identify the variable of interest and the stratifying variable. Be able to justify 
the appropriateness of stratified random sampling to the situation and your choice of a 
stratifying variable. Use real or realistic data and follow the procedure of constructing a 
frame, drawing a stratified sample, and constructing a confidence interval for the population 
mean and/or total. 

8. Describe a business situation in which cluster sampling would be appropriate. Justify 
the use of cluster sampling. Use real or realistic data to carry out the procedure of con¬ 
structing a frame, drawing a cluster sample, and constructing a confidence interval for the 
population mean and/or total. 

9. Describe a business situation in which systematic sampling would be appropriate. 
Justify the use of systematic sampling, and carry out the procedure of constructing a frame, 
drawing a systematic sample, and constructing a confidence interval for the population 
mean and/or total. 

10. An agricultural economist wishes to analyze farmers’ annual expenditures for veteri¬ 
nary services in a certain state. Since the amount spent for veterinary services varies greatly 
with farm size, the economist decides to use stratified sampling, with farm size as the 
stratifying variable. The economist obtains the following data (coded for ease of calcula¬ 
tion). Find: (a) f sV (b) x st , (c) V(T st ), (d) V(x st ). Construct the 95% confidence interval 
for (e) fx and (f) T. 





Stratum 


N } 


”h 


*h 


4 


1 

85 

17 

15 

20 

il 

75 

15 

20 

15 

III 

54 

11 

30 

20 

IV 

36 

7 

40 

15 


250 

50 




11. A representative of a certain industry wishes to estimate the mean number of sick- 
leave days accrued by industry employees in managerial positions. The industry consists 
of 300 manufacturing firms located throughout the United States. Each firm employs about 
6 people in management positions. The investigator wants to obtain the results quickly 
and cheaply, and thus uses cluster sampling, with each firm serving as a cluster. A simple 
random sample of 10 firms is selected. The following table shows the number of sick- 
leave days accrued by the management personnel. Find: (a) I c , and (b) F(3c d ). Construct 
the 95% confidence interval for ji. 


Firm 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

15 

21 

18 

17 

19 

15 

18 

18 

20 

16 

19 

21 

18 

19 

16 

19 

21 

16 

16 

15 

17 

16 

15 

19 

15 

16 

20 

18 

22 

16 

20 

21 

18 

17 

17 

18 

20 

18 

16 

21 

15 

16 

18 

21 

18 

20 

17 

15 

20 

24 

19 

20 

17 

16 

16 

21 

15 

19 

19 

17 

105 

115 

104 

109 

101 

109 

111 

104 

113 

109 


12. The head of a large firm’s accounting department wants to estimate the mean length 
(in minutes) of outgoing long-distance telephone calls. An accounting clerk selects 5 
systematic samples of size 9 from the past year’s long-distance records and records the 
length of each call in the samples. The results are as follows. Construct a 95% confidence 
interval for fi (M = 100). 


Sample 




Length of cal! (minutes) 




1 

7.1 

10.4 

3.6 

2.4 

1.9 

4.6 

6.3 

3.4 

2.9 

2 

10.3 

7.0 

4.0 

9.8 

4.1 

9.2 

12.1 

8.5 

8.0 

3 

4.8 

7.1 

2.5 

6.2 

12.2 

2.8 

9.6 

1.6 

5.8 

4 

3.7 

6.9 

5.1 

3.1 

2.4 

5.4 

5.5 

6.7 

2.6 

5 

2.3 

2.8 

8.3 

3.1 

8.7 

7.6 

4.0 

15.5 

1.4 


13. A real estate developer who wants to estimate the mean value per acre of farmland 
in a certain area uses stratified random sampling, with distance from a major recreation 
center as the stratifying variable. The investigation yields the following results. Compute 
3c st and construct the 95% confidence interval for fx. 


Stratum 

N h 

"h 

*h 

4 

A 

150 

15 

$500 

$1000 

B 

80 

8 

700 

1200 

C 

70 

7 

1000 

2500 

D 

50 

5 

2000 

4000 

E 

50 

5 

3000 

5000 


400 

40 






14. The manufacturers of plastic trash bags wish to estimate the mean tensile strength in 
a shipment of 1000 packages of their product. They employ cluster sampling, with a 
package of bags serving as a cluster. There are 40 bags per package. A random sample 
of 10 packages (clusters) yields the following results (coded for ease of computation). 
Compute 3c d and construct the 95% confidence interval for fx. 


Package 1 23 456 7 8 910 

Cluster total 100 105 99 101 98 95 102 100 103 97 

15. From the population of employed heads of households in Appendix II, select a single 
systematic sample of size 50. Construct a 95% confidence interval for the mean annual 
salary of the persons in the population. Assume that the population is in random order. 
Use the table of random numbers to select your starting point. 

16. From the population of employed heads of households in Appendix II, select 5 sys¬ 
tematic samples in such a way that you get a total of 50 units. Use the results to construct 
a 95% confidence interval for the mean number of years with current employer for the 
persons in the population. 

17. Select a stratified random sample from the population of employed heads of households 
in Appendix II. Use occupation as the stratifying variable, and construct a 95% confidence 
interval for the mean annual salary of the persons in the population. Use a total sample 
size of 60, and use proportional allocation. (Without the aid of a computer, this exercise 
may require considerable time.) 



How Long Should a Questionnaire Be? 


The length of a questionnaire is itself a question that plagues the questioners. 
What effect does length of a questionnaire have on quality of response? Some 
reseachers feel that there is a maximum length for questionnaires, which—if 
it is exceeded—impairs the quality of the responses. Others believe that under 
certain conditions questionnaires can be quite lengthy without any adverse 
effect on the quality of the responses. 

Herzog and Bachman* report the results obtained when both long and short 
questionnaires were administered to high school seniors. They found that "on 
the whole, the comparisons of long and short forms revealed rather little evi¬ 
dence of systematic differences. . . ." Do these results surprise you? 

The short form was designed to be completed in 45 minutes. The long form 
required, on the average, more than two hours. What sort of incentives do 
you think one would have to offer to high school seniors to ensure that they 
would give careful, correct responses to a questionnaire of this length? The 
high school students who completed Herzog and Bachman's long questionnaire 


C A. Regula Herzog and Jerald G. Bachman, "Effects of Questionnaire Length on Response Quality," Public 
Opinion Quarterly, 45 (Winter 1981), 549-559, 





were paid $5 and given released time from class. In your opinion, would these 
incentives provide adequate motivation for high-quality responses? 

Herzog and Bachman did find that for long sets of items, using the same 
response scales, the students tended to use the same response category for all 
items in such a set. Herzog and Bachman call this kind of response "straight- 
line" responding. Why do you think students would respond this way? 

What other factors—besides monetary rewards and incentives like released 
time from class—do you think would motivate students to give high-quality 
responses to long questionnaires? 

Have you had any experience in filling out survey questionnaires? How do 
you feel about the effect of length of a questionnaire on the quality of your 
responses? Have you ever refused to fill out a questionnaire because you thought 
it was too long? Would you consider a questionnaire that takes 45 minutes to 
complete a "short" questionnaire? 


The Order of Questions in a Questionnaire 


The design of questionnaires is an important aspect of sample surveys. One 
should design questionnaires with care, so that responses will accurately reflect 
the knowledge and/or opinions of respondents. It goes without saying that 
questions should be clear, unambiguous, and couched in language appropriate 
to the respondents' education and socioeconomic status. But, according to 
McFarland,* there is another important consideration when one is designing 
a questionnaire: the order in which the questions are asked. He conducted a 
telephone survey of 516 respondents, in which he studied the effects on re¬ 
spondents' attitudes toward certain issues when: (1) A general question preceded 
specific questions, and (2) specific questions preceded the general question. 

McFarland found that respondents expressed significantly more interest in 
certain issues (politics and religion) when the general question followed spe¬ 
cific questions on the same issues. Why do you think this was the case? Or do 
you think that respondents are more likely to give "true" responses to a gen¬ 
eral question when it precedes specific questions? 

McFarland also found that the effects of order of questions were consistent 
for both sexes and across education levels. What implications do these findings 
have for the design of questionnaires? 

McFarland's findings applied to politics and religion. What other issues do 
you think would yield the same results with respect to order of questions? 
What issues do you think would yield opposite results? 

*Sam G. McFarland, "Effects of Question Order on Survey Responses," Public Opinion Quarterly, 45 (Sum¬ 
mer 1981), 208-215. 



IS. Statistical Decision 
Theory 

Chapter Objectives: The main objective of this text 
is to give you some tools that will help you, as a 
business person, to make decisions. In this chapter you 
get a closer look at the decision process and add some 
more techniques to your arsenal of decision-making 
tools. After a careful study of this chapter, you should 
understand the following concepts and be able to use 
them in making decisions. 

1. The payoff table 

2. The maximin criterion 

3. The minimax criterion 

4. The maximax criterion 

5. The Hurwicz criterion 

6. The Bayes criterion 

7. Utility theory 



15.1 INTRODUCTION 


The Environment 
of Uncertainty 


Chapter 7 covered hypothesis testing, one of the two major areas of statistical 
inference, in considerable detail. This type of inference is known as classical 
statistical inference. The final step in the proposed hypothesis-testing procedure 
was designated as “making the administrative decision,” a step that involves the 
making of a decision. Classical statistical inference thus provides a theory and 
methodology that we can use to obtain information on which to base a decision. 

This chapter considers another theory and methodology that the decision-maker 
can use. This alternative to classical statistical inference as a decision-making tool 
is known as statistical decision theory. We have already discussed two concepts 
basic to the theory. These are the concept of subjective probability and the concept 
underlying Bayes’ theorem, including the mathematical formula from which the 
theory gets its name. 

As Chapter 3 indicated, the theory of subjective probability espoused by L. J. 
Savage (1972) is not admissible in classical statistical inference, which relies on 
the concepts of objective probability. Savage, on the other hand, sees reliance on 
repetitive events as a weakness of the objective view of probability. After intro¬ 
ducing his concept of personal probability, he states his belief that this is the only 
probability concept essential to those activities that utilize probability. He cites 
the personalistic view of probability contained in the works of de Finetti (1937, 
1950) as forming the basis of his theory. 

Bayes’ theorem, the other basic concept on which statistical decision theory is 
based, was covered in some detail in Chapter 3. In Section 15.2, we shall see the 
use that Bayesian practitioners make of this theorem. 

[The treatment of statistical decision theory in this text serves only as an intro¬ 
duction to the subject. For a more complete coverage, see the books by Newman 
(1971), Sasaki (1968), Schmitt (1969), Thompson (1972), and Winkler (1972). 
Readers with a strong mathematics background may wish to consult the works of 
Schlaifer (1969, 1981), Chemoff and Moses (1959), Blackwell and Girshick (1954), 
and Weiss (1961).1 


15.2 SOME BASIC IDEAS 


The typical decision-maker in a business organization operates in an environment 
characterized by some degree of uncertainty. This uncertainty may concern the 
present situation, the future outcome of a decision, or both. 

The following are examples of the types of decisions that management may 
face and that typically contain some degree of uncertainty: How will potential 
consumers react to a new product that is ready to be placed on the market? Should 
a company spend more money on TV advertising than on newspaper advertising? 
A firm is moving into a new building. Should the old building be sold or leased? 
How many units of a product should a manufacturer produce? How many units 
of a particular part should be kept in inventory? The list of examples is endless. 



The Payoff Table 


TABLE 15.2.1 
Payoff table 


In each situation calling for a decision, there are alternative acts that may be 
pursued. A firm can place a new product on the market, withhold it from the 
market, or make a decision after obtaining more information. A firm may channel 
the bulk of its advertising budget into TV advertising. Or it may spend less on 
TV advertising and more on newspaper advertising. Or it may decide to do further 
research and then make the decision. An old building may be leased, sold, or 
leased for a while and then sold. A firm can increase production, cut back, or 
hold the line. Inventory can be kept at a high level, kept at a low level, or parts 
may be ordered or manufactured as needed. Choosing the best act from the alter¬ 
natives available is the decision-maker’s responsibility. 

Associated with each act is a payoff, or consequence, resulting from the par¬ 
ticular act. The payoff may be positive, negative, or 0. In an environment of 
uncertainty, the nature of the payoff is unknown in advance. 

The nature of the payoff is determined by the event or outcome of the decision 
that is made. We can refer to outcomes of decisions as states of nature. If a firm 
places a new product on the market, the product may be well received or it may 
not appeal to the consumer. Potential customers for a firm’s line may be primarily 
television viewers, or they may be non-television watchers who spend a lot of 
time reading the newspaper. An old building may command a good rent over a 
long period, or it may be located in a part of town that is beginning to deteriorate 
and thus is less attractive to potential tenants. Future demand for a certain product 
may be such that it would be profitable to increase production. Or it may be such 
that the most profitable course of action would be to maintain the present level of 
production, or slow down. The price of parts may go up in the future, so that it 
may be most economical to carry as large a quantity in stock as possible. Storage 
costs per item, however, may be greater than any expected future price increase. 

The decision-maker needs to have the most potent tools available in order to 
make the best choices. Previous topics in this text have provided many of these 
tools. In this chapter, we suggest additional ones. 

A convenient way to visualize the relationships between acts, events, and the 
consequences of any combination of act and event is to prepare a payoff table. 
Table 15.2.1 shows a generalized payoff table. The column headings of the table 
indicate the various acts from which the decision-maker may choose. The row 
headings show the possible events or states of nature that may exist after the act 
has been executed. In the body of the table, the symbol indicates the payoffs 
for the various act-event combinations, where i tells in what row and j tells in 


Events 



Acts 


A i 

a 2 

A3 

A n 


Pii 

P 12 

Pi 3 

Pi n 

e 2 

P 21 

P'2.2 

P23 

P2n 

E 3 

031 

P32 

P 33 

\ • P3n 

L 

Pm\ 

Pm2 

Pm 3 

Pmn 



what column a particular p is located. The symbol p stands for payoff. To deter¬ 
mine the payoff when act A! is taken and event E, occurs, we locate the inter¬ 
section of row 1 and column 1 and find the payoff to be p n . The payoff for act 
A 3 and event E 2 is located at the intersection of row 2 and column 3, and we see 
that it is p 23 . We can find payoffs for all act-event combinations in a similar 
manner. 

We can show a decision-analysis situation graphically by means of a tree dia¬ 
gram. When we construct such tree diagrams, the possible acts appear as the first 
branches. The possible events associated with each act are shown as second-level 
branches emanating from the appropriate act branch. Figure 15.2.1 shows a tree 
diagram for a decision involving three acts and two events. The following ex¬ 
amples explain the construction and use of payoff tables. 

EXAMPLE 15.2.1 A company is about to place a new product on the market. There 
are two courses of action: (1) place the product on the market, or (2) do not place 
it on the market. For simplicity, assume that three events are possible: (1) demand 
for the product will be high, (2) demand will be weak, or (3) there will be no 
demand. Suppose that it has been determined that if the company places the 
product on the market and demand is strong, the payoff will be $10,000. If the 


FIGURE 15.2.1 
Tree diagram for a 
three-act, two- 
event decision 
situation 




TABLE 15.2.2 

Payoff table for 
Example 15.2.1 

Events 


Acts 


Market product 

Do not market product 


Demand strong 

§10,000 


0 


Demand weak 

2,000 


0 


No demand 

- 3,000 


0 


demand is weak, the payoff will be $2000. If there is no demand, the payoff will 
be -$3000, the cost of marketing the product. Of course, if the company does 
not market the product, the payoff will be 0 in any event. Table 15.2.2 shows 
the situation. Figure 15.2.2 represents the possible act-event combinations by a 
tree diagram. 

It is the role of the decision-maker to select a course of action. If the decision¬ 
maker knows the event or state of nature that will exist after the act has been 
chosen, there is no problem in making a decision. A decision-maker who knows 
with certainty, for example, that demand will be strong will market the product. 
Even if the demand is weak, it would be more profitable to market the product 
than not. Only when there is no demand would it be more profitable not to mar¬ 
ket it. 


FIGURE 15.2.2 
Tree diagram for 
Example 15.2.1 



Demand weak 



TABLE 15.2.3 
Payoff table for 
Example 15.2.2 


FIGURE 15.2.3 
Tree diagram for 
Example 15.2.2 




Acts 


Event 

All TV adv. 

All newspaper adv. 

Half-and-half 

Most customers watch TV only 

$60,000 

$ 2,000 

$30,000 

Most read newspapers only 

3,000 

40,000 

10,000 


EXAMPLE 15.2.2 A firm has an advertising budget of $20,000 with which to pro¬ 
mote a certain product. The firm can spend the entire $20,000 on television 
advertising, spend it all on newspaper advertising, or divide it between the two. 
Assume that the decision-maker knows that if a large proportion of the potential 
customers watch television, and all the budget is spent on TV advertising, the 
payoff will be $60,000. This is the payoff in cell 1 of the payoff table. Assume 
further that the decision-maker has been able to prepare additional entries as shown 
in Table 15.2.3. Figure 15.2.3 shows a tree diagram for this situation. 

The firm could divide the advertising between the two media in a proportion 
other than half-and-half. However, we will assume that for some reason this seems 
to be the best allocation. Likewise, for simplicity, we will assume that no other 
event is possible. Again, having infallible information about the future would 




TABLE 15.2.4 
Payoff table for 
Example 15.2.3 


The Maximin 
Criterion 




Acts 


Events 

Sell 


Lease 

Area deteriorates 

$250,000 


$ 95,000 

Area does not deteriorate 

250,000 


1,000,000 


make the choice simple. If the decision-maker knew that most of the potential 
customers watch television, he or she would select the act, “spend the entire 
$20,00 on TV advertising.” If the decision-maker knew that newspapers could 
reach more prospective customers, he or she would spend the entire $20,000 on 
newspaper advertising. 

EXAMPLE 15.2.3 Suppose that a company moving out of a building has been able 
to construct the payoff table shown in Table 15.2.4, relative to the acts “sell” or 
“lease” the old building and the events “the area deteriorates” or “the area does 
not deteriorate.” 

The decision-maker must decide whether to sell or lease. If the decision-maker 
knows that the area is not going to deteriorate, the decision will be to lease the 
building. If, however, the decision-maker knows for sure that the area is on the 
verge of rapid deterioration, the decision will be to sell. 

Unfortunately, typical decision-makers do not know for sure what event will 
take place. They must therefore make decisions in the face of uncertainty. With 
this responsibility, they need some criteria for making a choice among acts. 

The maximin criterion for choosing among acts assumes that the worst will hap¬ 
pen. That is, it assumes that the most undesirable state of nature will prevail at 
that time in the future when the decision-maker has selected the act. Given this 
assumption, then, the decision-maker chooses that act that gives the maximum of 
the minimum payoffs. Combining appropriate syllables from these words gives us 
the word maximin. 

Let us apply this criterion to our previous examples. In the case of the company 
about to market a new product, we would assume, under the maximin criterion, 
that if the new product is marketed there will be no demand. The payoff will thus 
be -$3000. If the product is not marketed, the payoff is 0 in any event. The act 
to choose would be the act, “do not market the product,” since 0 is greater than 
-$3000. 

Consider Example 15.2.2, in which the firm must decide how to spend its 
advertising budget. In Table 15.2.5 the payoffs for each act under the most un¬ 
desirable state of nature are enclosed in boxes. These are the minimum payoffs. 
The largest is $10,000. Thus the maximin criterion would lead to the decision to 
divide the advertising equally between TV and newspapers. 

In Example 15.2.3 the firm with a building to dispose of would sell if it followed 
the maximin criterion, since a sure payoff of $250,000 is greater than the $95,000 
payoff that would result from the worst possible event given the act, “lease.” 



TABLE 15.2.5 
Payoff table for 
Example 15.2.2 


Minimax Criterion 


The Maximax 
Criterion 


TABLE 15.2.6 
Payoff table for 
Example 15.2.4 


Acts 

Event AH TV adv. All newspaper adv. Half-and-half 


Most customers 
watch TV 
Most read news¬ 
papers only 


$60,000 


3,000 


$ 2,000 


40,000 


$30,000 


10,000 


We may also construct a payoff table with opportunity losses in the body of the 
table rather than payoffs. An opportunity loss is the difference between the payoff 
for a particular event and the payoff that we would have realized had we selected 
the best act for that event. When the payoff is dollars, opportunity loss represents 
the amount of profit that we lost because we did not select the most profitable 
act. Under the minimax criterion, the decision-maker expects the worst event to 
happen and so selects the act that will give the minimum of the maximum oppor¬ 
tunity losses. 

EXAMPLE 15.2.4 A manufacturer who wants to increase production of a product 
can choose between two courses of action. The firm can add an extra shift and 
produce the additional items, or subcontract to another firm. Future demand for 
the product may be strong or it may be weak. From these contingencies, the 
manufacturer can construct Table 15.2.6. From that table the opportunity-loss 
table, Table 15.2.7, can be constructed. 

Using the minimax criterion, if the firm adds a shift, the expectation is weak 
demand and a maximum opportunity loss of $2000. If the firm subcontracts, the 
expectation is strong demand and a maximum opportunity loss of $10,000. The 
minimum of these opportunity losses is $2000. Hence the manufacturer chooses 
to add a shift. 

As we have seen, the minimax criterion is a criterion for the pessimist. A decision 
rule for optimists is the maximax criterion, which assumes that for any act the 
event with the maximum payoff will take place. The optimistic decision-maker 
then chooses the act that will yield the maximum of these maximum payoffs. 
Using this criterion in the problems posed in Examples 15.2.1, 15.2.2, and 15.2.3, 
we would choose the following acts: 

1. In the case of the firm trying to decide whether or not to market a product, 
we would decide to market the product. 




Acts 


Event 

Add shift 


Subcontract 

Strong demand 

$30,000 


$20,000 

Weak demand 

3,000 


5,000 





TABLE 15.2.7 
Opportunity loss 
table for Example 
15.2.4 


Hurwicz Criterion 


The Bayes Criterion 


TABLE 15.2.8 
Payoff table for 
Example 15.2.2 




Acts 


Event 

Add shift 


Subcontract 

Strong demand 

0 


$10,000 

Weak demand 

$2,000 


0 


2. In trying to decide where to spend the advertising budget, we would optimis¬ 
tically believe that most customers watch TV only, and therefore spend our entire 
advertising budget on TV ads. 

3. The firm with the surplus building would elect to lease the building. 

For those whose outlook lies between the extremes of the pessimists and the 
optimists, Hurwicz (1951) has suggested a compromise. Using the Hurwicz cri¬ 
terion, the decision-maker takes a weighted average of the maximum and mini¬ 
mum payoffs for each act, then chooses the act with the largest weighted average. 
The weights used represent what the decision-maker feels are the probabilities of 
occurrence of the maximum and minimum payoffs. Given the payoff matrix of 
Table 15.2.3, reproduced as Table 15.2.8, suppose that the decision-maker feels 
that the probability of a maximum payoff is 3/4 and, consequently, the probability 
of a minimum payoff is 1/4. The decision-maker would evaluate the three acts 
as follows: 


Spend entire budget on TV advertising: 

$60,000 (3/4) + $3000 (1/4) = $45,750 

Spend entire budget on newspaper advertising: 

2000 (1/4) + $40,000 (3/4) = $30,500 

Spend half on TV and half on newspaper: 

30,000 (3/4) + $10,000 (1/4) = $25,000 

The decision-maker would then choose the first act, since it gives the maximum 
weighted average. Figure 15.2.4 shows the tree diagram reflecting this decision 
situation. The dollar values shown at the right end of the event branches are the 
payoffs. The weighted averages for each act are shown at the origin of the event 
branches. The probabilities appear on their respective event branches. 

The Bayes criterion, which enables us to apply the subjective probability concepts 
discussed in Chapter 3, is a decision rule that gives the decision-maker a mech¬ 
anism for maximizing expected profit or minimizing expected opportunity loss, 




Acts 


Event 

All TV adv. 

All newspaper adv. 

Half-and-half 

Most customers watch TV only 

$60,000 

$ 2,000 

$30,000 

Most read newspapers only 

3,000 

40,000 

10,000 



FIGURE 15.2.4 
Tree diagram for 
Example 15.2.2, 
showing 
application of 
Hurwicz criterion 



depending on the situation. To use the Bayes criterion, the decision-maker must 
be able to assign a probability to each specified event or state of nature. The sum 
of these probabilities must be 1. These probabilities represent the strength of the 
decision-maker’s feeling about the likelihood of occurrence of the various events. 
Because the process generating these probabilities is usually subjective, many 
people reject the Bayes criterion and related theory. 

After identifying the relevant future events and assigning the probabilities, the 
decision-maker computes the expected payoff for each act and chooses the act 
with the best expected payoff. If the payoffs represent income or profit, the de¬ 
cision-maker chooses the act with the highest expected payoff. If, on the other 
hand, the payoffs represent opportunity losses, or costs, the decision-maker selects 
the act with the lowest expected payoff. 

EXAMPLE 15.2.5 A real-estate developer has a piece of land adjacent to a larger 
tract of land that is soon to be zoned for either industrial, office park, or residential 
use. She must decide how to develop the property before the zoning decision has 
been made for the larger tract. She can develop the property for a grocery store, 


Acts 


TABLE 15.2.9 
Payoff table for 

_ , _ _ _ _ Events Construct grocery store Construct restaurant Construct service station 

Example 15.2.5 _ _ __ 

Industrial park $10,000 $18,000 $25,000 

Office park 10,000 50,000 15,000 

Residential 60,000 15,000 20,000 


a restaurant, or a service station. To evaluate the situation, she has been able to 
construct Table 15.2.9, in which payoffs represent net realizable profit over the 
next five years. 

The developer’s analysis has also led her to have a fairly strong opinion about 
the probabilities of the various outcomes of the zoning decision. She feels that it 
is twice as likely that the adjacent tract of land will be zoned for residential use 
as for an office park. She feels that an office park is as likely as an industrial 
park. In other words, she assigns the probabilities 0.50, 0.25, and 0.25 to the 
events residential, office park, and industrial park, respectively. 

We compute the expected payoff for each of the three acts by multiplying the 
payoffs for each event by their assigned probabilities and summing these products. 
Performing these calculations leads to the following expected payoffs. 

Expected 

Act Computations payoff 

Grocery store $10,000(0.25) + $10,000(0.25) + $60,000(0.5) = $35,000 
Restaurant 18,000(0.25) + 50,000(0.25) + 15,000(0.5) = 24,500 

Service station 25,000(0.25)+ 15,000(0.25)+ 20,000(0,5)= 20,000 

The developer decides to construct a grocery store, because the maximum ex¬ 
pected payoff is associated with that act. Since she selects this act under conditions 
of uncertainty, the expected profit, $35,000, is referred to as the expected profit 
under uncertainty. The act is referred to as the optimal act. 

Sections 15.3 and 15.5 give more illustrations of the use of the Bayes criterion. 
[For further reading on the decision criteria presented here, see the works of 
Schlaifer (1969, 1981), Wald (1950), Weiss (1961), and Luce and Raiffa (1957). 
For treatments of the subject that are less demanding mathematically, see Dyck- 
man et al. (1969), Miller and Starr (1969), Baumol (1972), Winkler (1972), and 
Sasaki (1968). The papers of Savage (1951) and Roberts (1960) are easily under¬ 
stood and informative introductions to decision theory.] 

15.2.1 A firm is considering whether or not to sponsor a certain television program. It 
constructs the following payoff table, (a) The firm’s advertising manager applies the max- 
imin criterion. What course of action is indicated? (b) The manager applies the minimax 
criterion. What course of action is indicated? (c) The manager applies the maximax cri¬ 
terion. What course of action is indicated? (d) The firm decides to use the Hurwicz 
criterion. What act results? Let P (maximum payoff) = 0.5 and P (minimum payoff) = 
0.5. (e) The advertising manager feels that the probabilities associated with events £j, E 2 , 



and E 3 are 0.4, 0.3, and 0.3, respectively. Apply the Bayes criterion. What act is sug¬ 
gested? 


Acts 

Events Sponsor program (A-i) Do not sponsor program ( A 2 ) 


Very favorable viewer reaction (£,) $400,000 $0 

Favorable viewer reaction (£ 2 ) 150,000 0 

Unfavorable viewer reaction (£ 3 ) — 250,000 0 



15.2.2 A manufacturer of corn crisps wants to reach a decision about adopting a new 
package. The possible acts are: (1) adopt the new package, (2) stick with the old package, 
and (3) give buyers a choice by packaging half the production in the new and half in the 
old. The possible customer reactions (events) are: (1) preference for new package, (2) 
preference for old package, and (3) indifference. The manufacturer constructs the following 
payoff table. Determine the appropriate course of action using: (a) the maximin criterion, 
(b) the minimax criterion, (c) the maximax criterion, (d) the Hurwicz criterion, letting P 
(max payoff) = 0.7 and P (min payoff) = 0.3, (e) the Bayes criterion, assuming that the 
probabilities associated with the three events—prefer new, prefer old, and indifferent— 
are 0.7, 0.2, and 0.1, respectively. 


Acts 


Events 

New (AO 

Old (A 2 ) 

Half-and-half (A 3 ) 

Prefer new 

$4,000,000 

$1,000,000 

$2,000,000 

Prefer old (£ 2 ) 

400,000 

3,500,000 

1,500,000 

Indifferent (£ 3 ) 

3,000,000 

3,000,000 

3,000,000 

15.2.3 Given the following payoff table, determine the appropriate act under the (a) 
maximin criterion, (b) minimax criterion, (c) maximax criterion, (d) Hurwicz criterion, P 

(max payoff) = 

0.8, P (min payoff) 

= 0.2, (e) Bayes criterion. 



Acts 


Events 

Ay 

^2 

A 3 £(£/) 


100 

50 

80 0.5 


75 

150 

25 0.3 

E 3 

25 

50 

160 0.2 

15.2.4 Consider the following payoff table. What is the appropriate act when applying 

the (a) maximin 

criterion, (b) minimax criterion, (c) maximax criterion, (d) Hurwicz 

criterion, P (max payoff) = 0.7, P (min payoff) = 0.3, (e) Bayes criterion? 



Acts 


Events 

Ay 

^2 

A 3 P(Ei) 

Ei 

100 

50 

80 0.6 

e 2 

75 

150 

25 0.3 

e 3 

25 

50 

160 0.1 


15.2.5 Describe a business situation in which decision theory might be appropriate. Con¬ 
struct a payoff table and do steps (a) through (e) as in Exercise 15.2.3. 








15.3 APPLICATION OF THE BAYES CRITERION 


Bayes’ theorem, as discussed in Chapter 3, has recently assumed an important 
role in statistical decision theory. In fact, the terms “Bayesian theory” and “sta¬ 
tistical decision theory” are often used interchangeably. However, Bayesian the¬ 
ory is only a subset of statistical decision theory. Other subsets include utility 
theory, discussed in Section 15.4, and payoff analysis, which we covered in 
Section 15.2. 

This section illustrates the use of the Bayesian decision criterion by extending 
Example 15.2.5. We will discuss three possible phases of a business decision 
process and illustrate them by means of this example. These three phases, in order 
of their occurrence, are prior analysis , preposterior analysis, and posterior anal¬ 
ysis. 

First, you need to understand the concept known as the expected value of perfect 
information (EVPI). The value of EVPI in a given situation is the maximum 
amount of money that you should spend for additional information. For example, 
if you can get additional information through sampling, the EVPI figure indicates 
the maximum amount of money that you should spend on the sampling process. 

You find the expected value of perfect information in a given situation by 
subtracting the expected profit under uncertainty from the expected profit with 
perfect information. In Example 15.2.5, we calculated the expected profit under 
uncertainty for the three available acts. We found these quantities to be $35,000, 
$24,500, and $20,000 for the acts “build a grocery store,” “ build a restaurant,” 
and “build a service station,” respectively. We found that the decision-maker 
should choose the first act, since it promised the maximum profit. 

We now need to find the expected profit given perfect information. This is 
calculated on the assumption that the decision-maker has a perfect predictor avail¬ 
able. When this perfect predictor indicates that a certain event will occur, the 
decision-maker chooses the optimal act for that event. We now attach a slightly 
different interpretation to the probabilities assigned to the events. We interpret 
each one as the relative frequency with which the perfect predictor predicts the 
associated event. For example, we would now say that when faced with the present 
situation a great many times, the predictor would predict residential zoning 50% 
of the time, office-park zoning 25% of the time, and industrial-park zoning 25% 
of the time. Table 15.3.1 shows the calculation of the expected profit with perfect 
information for Example 15.2.5. 


TABLE 15.3.1 
Calculation of 
expected profit 
with perfect 
information for 
Example 15.2.5 


Event 

Profit for optimal act 

Probability 

Weighted profit 

Industrial park 

$25,000 

0.25 

$6,250 

Office park 

50,000 

0.25 

12,500 

Residential 

60,000 

0.50 

30,000 


$48,750 


Prior Analysis 


The figure $48,750 is the expected profit with perfect information, EVPI. We 
can interpret this as the average profit that would be realized in the long run if 
the decision-maker, repeatedly faced with this same problem, each time took the 
optimal act associated with the event predicted by the predictor. Some writers call 
this value the expected profit under certainty. We now have enough information 
to compute EVPI = expected profit with perfect information - expected profit 
under uncertainty = $48,750 - $35,000 = $13,750. 

Note that the expected value of perfect information is, in general, equal to the 
expected opportunity loss of selecting the optimum act in an uncertain environ¬ 
ment. (We discussed opportunity loss in Section 15.2.) Computing the expected 
opportunity loss under uncertainty using the Bayesian criterion is like computing 
the expected profit under uncertainty. Computing that quantity for the present 
example, we find that it is equal to $13,750, the expected value of perfect infor¬ 
mation that we just computed. When we convert the payoff table, Table 15.2.9, 
to an opportunity loss table, we have Table 15.3.2. 

The optimal act, considering expected opportunity losses, would be to build a 
grocery store, since that would minimize the expected opportunity loss. This 
value, $13,750, is equal to the expected value of perfect information. Some writers 
refer to the expected opportunity loss for the optimal act under uncertainty as the 
cost of uncertainty . Thus the following three terms mean the same thing: expected 
value of perfect information , expected opportunity loss for the optimal act under 
uncertainty, and cost of uncertainty. 

In the foregoing illustrations, the probabilities that we assigned to events were 
prior probabilities. They are called prior because the decision-maker formulated 
them prior to acquiring experimental or sampling information. As a rule, these 
prior probabilities are subjective, representing decision-makers’ best estimates of 
the relative likelihood of the various events. The analysis that is carried out using 
these prior probabilities is called prior analysis. 

Following the prior analysis, decision-makers must decide either to get more 
information or to take the final action indicated by the prior analysis. They can 
get more information by conducting a survey, by carrying out an experiment, or 
by some other means. This additional information is usually called sample infor¬ 
mation, no matter how they acquire it. 

If decision-makers elect to take the action indicated by the prior analysis, no 
further analyses are necessary. They forge ahead and await the consequences. If, 
however, they decide to get more information, they may find that this new infor¬ 
mation causes them to substitute new probabilities for the prior ones. With new 
probabilities for the various events, decision-makers will perform another analysis 
using this new information. They recompute the expected profit for the various 
acts. They obtain these new probabilities, called posterior probabilities, by using 
Bayes’ theorem. The subsequent analysis with the new probabilities is called 
posterior analysis. 

The question here is whether it’s worthwhile to obtain further information. In 
general, additional information is bought at a price. Decision-makers must decide 
whether the potential result is worth the cost. 



TABLE 15.3.2 
Opportunity-loss 
table for real- 
estate developer 


Preposterior 

Analysis 


Events 


Acts 


Construct 
grocery store 

Construct 

restaurant 

Construct 
service station 

Industrial park 

$15,000 

$ 7,000 

$0 

Office park 

40,000 

0 

35,000 

Residential 

0 

45,000 

40,000 

Expected opportunity loss for each of the three acts: 


ACT: CONSTRUCT GROCERY 

’ STORE 





Opportunity 

Weighted 

Event 

Probability 

loss 

opportunity loss 

Industrial park 

0.25 

$15,000 

$ 3,750 

Office park 

0.25 

40,000 

10,000 

Residential 

0.50 

0 

0 

Expected opportunity loss 



= $13,750 

ACT: CONSTRUCT RESTAURANT 



Opportunity 

Weighted 

Event 

Probability 

loss 

opportunity loss 

Industrial park 

0.25 

$ 7,000 

$ 1,750 

Office park 

0.25 

0 

0 

Residential 

0.50 

45,000 

22,500 

Expected opportunity loss 



= $24,250 

ACT: CONSTRUCT SERVICE 

STATION 





Opportunity 

Weighted 

Event 

Probability 

loss 

opportunity loss 

Industrial park 

0.25 

$0 

$0 

Office park 

0.25 

35,000 

8,750 

Residential 

0.50 

40,000 

20,000 

Expected opportunity loss 



= $28,750 


The objective in preposterior analysis is to find out whether it’s worthwhile to 
gather further information (to sample) before taking final action. 

EXAMPLE 15.3.1 The owner of a chain of nursing homes wants to open a new 
facility in a certain area. He usually builds 25-, 50-, or 100-bed facilities, de¬ 
pending on whether anticipated demand is low, medium, or high. He has con¬ 
structed Table 15.3.3 on the basis of past experience. The payoffs in Table 15.3.3 
are short-range net profits. 

On the basis of his information about the area, the owner feels that the prob¬ 
abilities of low demand (E x ), medium demand (Zs 2 )» and high demand ( E 3 ) are 
0.1, 0.4, and 0.5, respectively. Formally, we may write P(E]) = 0.1, P(E 2 ) " 
0.4, and P(E 3 ) = 0.5. The expected payoffs for each act are as follows. 



Act 


Expected payoff 


TABLE 15.3.3 
Payoff table for 
Example 15.3.1 


Build 25-bed facility: 30,000(0.1) -f 35,000(0.4) + 40,000(0.5) = $37,000 

Build 50-bed facility: (-20,000)(0.1) + 50,000(0.4) + 55,000(0.5) = $45,500 

Build 100-bed facility: (-40,000) (0.1) + (- 10,000)(0.4) + 75,000(0.5) = $29,500 


Events 


Acts 


Build 25-bed facility 

Build 50-bed facility 

Build 100-bed facility 

Low demand 

$30,000 

- $20,000 

- $40,000 

Medium demand 

35,000 

50,000 

-10,000 

High demand 

40,000 

55,000 

75,000 


Suppose that the owner decides not to collect additional information. That is, 
he will base his decision on this prior analysis. His decision would be to build a 
50-bed facility, since that has the highest expected payoff. Figure 15.3.1 is a tree 
diagram for the decision options and possible outcomes for this situation. 

We can now compute the expected value of perfect information. However, we 
first need to calculate the expected profit with perfect information, as shown in 
Table 15.3.4. Thus we find EVPI = $60,500 - $45,500 = $15,000. 

Recall that we can also interpret this value, $15,000, both as the expected 
opportunity loss for the optimal act under uncertainty and as the cost of uncer¬ 
tainty. Since the decision-maker can do no better than obtain perfect information, 
this figure places an upper bound on the amount that he is willing to pay for 
sample information that he knows will be something less than perfect. 

The owner of the chain of nursing homes at this point decides to look into the 
idea of conducting a survey in the area to get an updated estimate of demand. A 
research firm will conduct the survey for $5000 and provide information that will 
be translated into an estimate of either low demand (A,), medium demand (X 2 ), 
or high demand (X 3 ), depending on the results of the survey. The research firm 
describes the reliability of these estimates as follows: Over many years of doing 
such surveys, the firm has found that when the true demand for a thing is low, 
sample evidence indicates a low demand about 75% of the time. The evidence 
indicates a medium demand about 15% of the time, and a high demand about 
10% of the time. These are the conditional probabilities of a sample estimate, 
given a specific event. For example, 0.75 is the conditional probability of a sample 
estimate of low demand, given that the true event or state of nature is low demand. 


Calculation of 

Event 

Profit for optimal act 

Probability 

Weighted profit 

expected profit 

Low demand 

$30,000 

0.1 

$ 3,000 

with perfect 

Medium demand 

50,000 

0.4 

20,000 

information. 

High demand 

75,000 

0.5 

37,500 

Example 15.3.1 

Expected profit with perfect information 


= $60,500 



FIGURE 15.3.1 
Tree diagram for 
decision options 
and possible 
outcomes for 
Example 15.3.1 
without collecting 
additional 
information 



Medium demand <£,), 0.4 


Medium demand (£ 2 ), 0.4 


Build 50-bed facility 


Medium demand {£J, 0.4 


When the true demand is medium, the sample evidence indicates medium demand 
about 80% of the time, low demand about 8% of the time, and high demand about 
12% of the time. When the true demand is high, sample estimates of low, medium, 
and high demand occur with relative frequencies of 0.05, 0.10, and 0.85, re¬ 
spectively. Table 15.3.5 displays the research firm’s evaluation of the conditional 
probabilities, where the cell entries are the P(Xj\E,y s. 


TABLE 15.3.5 
Conditional 
probabilities for 
Example 15.3.1 


True events 


Sample estimates 


Low demand (X^) 

Medium demand ( X 2 ) 

High demand ( X 3 ) 

Low demand (£,) 

0.75 

0.15 

0.10 

Medium demand (£ 2 ) 

0.08 

0.80 

0.12 

High demand (£ 3 ) 

0.05 

0.10 

0.85 


Next the decision-maker wants to know the probability that a given event is the 
true state of nature and that the sample estimates it as such. These probabilities 
are the joint probabilities of each state of nature and each sample estimate. They 
are found by multiplying the prior probability of the event by the conditional 
probability of the sample estimate, given the event. For example, we find the 
probability of the true event being low demand and the sample evidence indicating 
low demand by multiplying 0.1 by 0.75 to give 0.075. Symbolically, 

P(E l ni,) - P{E { )P{X, |£j) 

Table 15.3.6 displays these joint probabilities. In the table, note that the row totals 
are the marginal probabilities of the respective events. These are equal to the prior 
probabilities. The column totals are the marginal probabilities for the respective 
sample estimates, given all events. 

At this point, we use Bayes’ theorem to incorporate information that may be 
provided by the possible outcomes of sampling. Applying Bayes’ theorem yields 
what are called posterior probabilities. Given a particular sample estimate, we 
find the probability that a specific event is the true event. Symbolically we shall 
evaluate 

P(E t )P{X\E^ 

= — v J L \ 1 L (15.3.1) 

iPiEJPiXjlE,) 

For example, the probability of low demand (£\), given a sample estimate of low 
demand (Xj), is 

P(E X |X.) = = 0.5682 

Table 15.3.7 shows the posterior probabilities arrived at from the data of this 
example. 

The next step in preposterior analysis is to use the posterior probabilities from 
Table 15.3.7 to find the expected payoffs for each act, given the various possible 
sample estimates. Table 15.3.8 shows the computation and results of this analysis. 

Suppose that the owner decides to engage the research firm. We know that he 
will get one of three possible results: an indication that the demand is either low, 
high, or medium. If the sample estimates a low demand, the expected payoffs are 
as shown on the next page. 



TABLE 15.3.6 
Joint probabilities 
for Example 15.3.1 


Joint probability 

Prior - Total 

Event probability ^nX,) P(E,nX 2 ) P(E, n X 3 ) P(E,) 


0.1 

0.075 

0.015 

0.010 

0.1 

0.4 

0.032 

0.320 

0.048 

0.4 

0.5 

0.025 

0.050 

0.425 

0.5 

1.0 

0.132 

0.385 

0.483 

1.0 


Total 


TABLE 15.3.7 
Posterior 
probabilities of 
specific events 

Event 

Ej 



Posterior probabilities 


P(E, |*i) 


P{E,\X 2 ) 

P{Ei\X z ) 


0.5682 


0.0390 

0.0207 

given a particular 

*2 

0.2424 


0.8312 

0.0994 

sample estimate. 

*3 

0.1894 


0.1299 

0.8799 

Example 15.3.1 

Total 

1.0000 


1.0000 

1.0000 


This indicates that if the survey is made and the results indicate a low demand, 
the owner should choose the act, “build 25-bed facility,” since it yields the 
maximum of the expected payoffs. 


Act 


Expected payoff 


TABLE 15.3.8 
Expected payoffs, 
preposterior 
analysis. Example 
15.3.1 


Build 25-bed facility $33,106.00 

Build 50-bed facility 11,173,00 

Build 100-bed facility —10,947.00 

The research firm carries out the survey and estimates that demand will be 
medium. What should the owner do? Since the expected payoffs are $35,458.00, 
$47,924.50, and -$129.50 for the acts “build 25-bed facility,” “build 50-bed 
facility,” and “build 100-bed facility,” respectively, he should build a 50-bed 
one, since that yields the maximum expected payoff. If the sample estimate had 
shown high demand, the act “build 100-bed facility” would yield the maximum 
expected payoff. The outcome of a survey is uncertain. What we need is a single 
figure that indicates what the expected payoff is if a survey is done and if we 


ACT: BUILD A 25-BED FACILITY 


1. Low demand 


2. Medium demand 


3. High demand 



(30,000) (0.5682) = 

$17,046.00 

(30,000) (0.0390) 

$ 1,170.00 

(30,000) (0.0207) 

a 

$ 621.00 

(35,000) (0.2424) = 

8,484.00 

(35.000) (0.8312) = 

29,092.00 

(35,000) (0.0994) 

'= 

3,479.00 

(40,000) (0.1 894) = 

7,576.00 

(40,000) (0.1 299) 

5,196.00 

(40,000) (0.8799) 

= 

35,196.00 

Expected payoff; 

$33,106.00 


$35,458.00 



$39,296.00 

ACT: BUILD A 50-BED 

FACILITY 






1. Low demand 


2. Medium demand 


3. High demand 



(— 20,000)(0.5682) = 

$-11,364.00 

(-20,000) (0.0390) ■= 

$ - 780.00 

(- 20,000) (0.0207) 

:= 

$-414.00 

(50,000) (0.2424) = 

12,120.00 

(50,000) (0.831 2) = 

41,560.00 

(50.000) (0.0994) 

= 

4,970.00 

(55,000) (0.1 894) = 

10,417.00 

(55,000) (0.1299) = 

7,144.50 

(55,000) (0.8799) 

= 

48,394 50 

Expected payoff: = 

$11,173.00 


$47,924.50 



$52,950,50 

ACT: BUILD A 100-BED 

FACILITY 






1. Low demand 


2. Medium demand 


3. High demand 



(-40,000) (0.5682) = 

$-22,728.00 

(-40,000) (0.0390) = 

$-1,560.00 

(-40,000) (0.0207) 

= 

$-828.00 

(-10,000) (0.2424) = 

- 2,424.00 

(-10,000)(0.831 2) m 

-8,312.00 

(— 10,000) (0.0994) 

= 

- 994.00 

(75,000) (0.1894) = 

14,205.00 

(75,000) (0.1299) = 

9,742.50 

(75,000) (0.8799) 

— 

65,992.50 

Expected payoff: 

$-10,947.00 


$-129.50 



$64,170.50 




select the optimal act after we get the survey estimates. We get such a figure by 
multiplying the maximum payoffs for each sample result by the marginal proba¬ 
bilities of these results and summing the products. For the present example, 

$33,106.00(0.132) + $47,924.50(0.385) + $64,170.50(0.483) = $53,815.28 

This indicates that if a survey is conducted, the resulting payoff may be as great 
as $53,815.28. The maximum expected payoff (when a 50-bed facility is built) 
without benefit of a survey, as we have seen, is only $45,500. 

Should the owner have the research firm carry out the survey? Yes, because 
the expected payoff with a survey is greater than the expected payoff without one. 
The difference is $53,815 — 45,500 = $8315. This figure is called the expected 
value of sample information (EVST). It indicates how much a person should be 
willing to pay for sample information. In this example, the survey is to cost $5000. 
Since the expected net gain from sampling (ENGS) is $8315 — $5000 = $3315, 
the owner should spend the money to have the survey done. 

Figure 15.3.2 is a tree diagram for Example 15.3.1. This diagram compares 
the alternatives of (1) not collecting more information (not sampling), and (2) 
collecting more information (sampling). Note that the alternative of not sampling 
results in a decision situation of the type discussed in Section 15.2. 

The probabilities shown on the terminal branches of the “Not sampling” major 
branch of Figure 15.3.2 are the prior probabilities of the events “low demand” 
(£j), “medium demand” (E 2 ), and “high demand” (£ 3 ). The probabilities shown 
on the terminal branches of the “Sampling” major branch are revised probabilities 
based on the sample results shown in Table 15.3.7. 

Posterior Analysis Once decision-makers have decided to sample, carried out the survey, and know 
the results, they are ready to carry out posterior analysis. Posterior analysis is the 
method whereby we combine sample information with prior information to obtain 
revised probabilities for various events. This method of incorporating sample 
information into the analysis is the same as that explained in the section on 
preposterior analysis. In preposterior analysis, we found revised probabilities for 
every possible sample outcome. In posterior analysis, we compute revised prob¬ 
abilities for the single sample outcome that actually occurred. For example, sup¬ 
pose that the owner of the nursing homes obtained from his sample survey an 
estimate of medium demand. He would use the middle column of probabilities 
from Table 15.3.7. These are the only ones of interest to him now, since they are 
associated with the actual outcome of the survey. A decision-maker who is now 
ready to choose an act is interested in the expected payoffs. The owner need look 
only at the middle column of Table 15.3.8. He should choose the second act, 
“build a 50-bed facility,” since that act yields the largest expected payoff. 

Sometimes the costs of sampling may be slight. In this case the decision-maker 
would probably get sample information and do a posterior analysis, bypassing the 
preposterior step. 

Note that after an initial survey, a decision-maker may wish to look into the 
possibility of getting still more sample information. If so, the posterior probabil- 
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FIGURE 15.3.2 
Tree diagram for 
Example 15.3.1 


ities become the prior probabilities for purposes of preposterior and posterior 
analysis connected with the second sampling activity. One can continue this process 
as long as it is profitable to do so. Such a course of action is a sequential decision- 
making procedure. 

15.3.1 A farmer may choose one of three main crops, q, c 2 , and c 3 , for planting in the 
spring. The yield of the crops—and consequently the income from them—depends on the 
weather conditions during the growing season. One can describe three categories of weather 
conditions, w u w 2 , and w 3 . Condition vq is better for q and unfavorable to c 2 and q, 
although not completely ruinous. The same relationship exists between w 2 , c 2 , q, and c 3 ; 
and w 3 , c 3 , q, and c 2 . The farmer does not know which weather condition will prevail 
during the coming season. Therefore she must decide what crop to plant in an environment 
of uncertainty. She has been able to construct the following payoff table, (a) Compute 
the expected opportunity loss for each of the three acts, (b) Compute the EVPI. (c) 
What act do these computations pinpoint as the optimal act? (d) Which act has the highest 
expected payoff? (e) Compute the expected profit with perfect information, (f) Does the 
expected profit with perfect information, less the expected profit under uncertainty, equal 
the expected opportunity loss of selecting the optimum act in an uncertain environment? 
(g) Without further analysis, which crop would the farmer plant? (h) What is the max¬ 
imum amount the farmer would be willing to pay for “sample” information? 

Event (weather Act (plant crop, cj) 

condition wy - 


will prevail) 

c^ 

C2 

<?3 

P(wi) 

w 1 

§50,000 

§30,000 

§25,000 

0.6 

w 2 

40,000 

60,000 

35,000 

0.3 

w 3 

45,000 

55,000 

70,000 

0.1 


15.3.2 An operator of a chain of coin-operated laundries is planning to open a new laundry 
in another section of the city. He is trying to decide whether to install 10, 15, or 25 
washing machines. He envisions three levels of demand that may exist in the area. He has 
been able to come up with subjective probabilities associated with each. His payoff table 
is as follows. Answer questions (a) through (f) as in Exercise 15.3.1. 

Act: Install 

Event: - 


Demand 

10 machines (A i) 

15 machines (A 2 ) 

25 machines (A 3 ) 

P{Ei) 

High 

§6,000 

§10,000 

§15,000 

0.40 

Medium 

5,000 

9,000 

8,000 

0.35 

Low 

4,800 

3,000 

2,000 

0.25 


(g) Without further analysis, which act would the laundry operator pursue? (h) What is 
the maximum amount he would be willing to pay for “sample” information? (i) A 
research firm, for $1000, will conduct a survey to estimate demand. The firm provides the 
following data relative to past experience with similar surveys. 

The sample estimates the demand to be as follows the 
indicated proportion of times: 


When the 
true event is: 

High demand 
(*i) 

Medium demand 
(*2) 

Low demand 
(* 3 ) 

High demand (fj) 

0.82 

0.15 

0.03 

Medium demand ( E 2 ) 

0.09 

0.74 

0.17 

Low demand (£ 3 ) 

0.03 

0.12 

0.85 


Find the probability that each event is the true event and that the sample estimates it as 
such. That is, find the joint probabilities P(Ei H Xj) = P{E^)P{Xj\E^). (j) Compute the 

posterior probabilities of each event, given each sample estimate, (k) Use the posterior 
probabilities from step (j) to obtain the expected payoffs for each act, given each of the 
sample estimates. (I) Suppose that a survey is taken. What action should be taken if the 
survey estimates high demand? If the survey estimates medium demand? If the survey 
estimates low demand? (m) What is the expected payoff if a survey is done and the 
laundry operator chooses the optimal act after acquiring the survey estimates? (n) Should 
a survey be done? (o) What is the expected value of sample information? (p) In light 
of the analysis to this point, what should the operator do? 

15.3.3 Describe a business situation in which prior, preposterior, and perhaps posterior 
analysis would be appropriate. Use real or realistic data to provide the necessary inputs, 
and do a complete analysis as called for in Exercise 15.3.2. If appropriate, carry the 
analysis further to include a posterior analysis. 


15.4 UTILITY THEORY 

In Sections 15.2 and 15.3, we expressed payoffs—both profits and losses—in 
dollar amounts. Clearly, however, not all decision-makers attach the same value 
to money. Consider Business Person A, who has total capital of $100,000. An 
investment opportunity arises that requires an expenditure of $100,000. This in¬ 
vestment will either double the money or leave A with none. That is, A will either 
gain $100,000 or lose all the capital. The probability of the latter event is 0.25. 
The probability of the former is 0.75. The expected result from this investment 
is then 


0.75($200,000) + 0.25(-$100,000) = $125,000 

Also available to A is another investment opportunity that requires an expenditure 
of $100,000. The profit on this investment is a certain $20,000. 

Consider another business person, B , whose capital assets are $10,000,000. 
Suppose that B has the same investment opportunities as A. We would not expect 
identical behavior on the part of A and B. Whereas A stands to lose everything 
by electing to take “advantage” of the opportunity involving a gamble, B stands 
to lose only 1% of total capital by deciding on this course of action. Thus $100,000 
appears to represent different levels of value to these two people. We would expect 
A to forego the gamble and B to take it. 

By following this line of reasoning, we have, by implication, cast A and B in 
the roles of prudent people. Suppose, on the other hand, that A is a pathological 
gambler and B is conservative to a fault. We might then find A passing up a sure 
$20,000 out of preference for a 0.75 chance of realizing a profit of $100,000. 
Likewise B, if conservative enough, would find the chance that we have described 
unthinkable. 

Therefore we may conclude that the actions a decision-maker takes depend on 
at least two conditions: level of assets and attitude toward risk. We know that 
these conditions are not the same for all decision-makers. 



The Standard 
Gamble 


Absolute dollar amounts of money may not be adequate criteria on which to 
base decisions of this sort. This fact was recognized as far back as 1738, when 
Daniel Bernoulli (1738) concluded that all people do not use the same rule when 
evaluating a gamble. Bernoulli proposed that the value of an item should be 
determined by the utility it yields, not by its price. The concept of utility presented 
here is based on the writings of von Neumann and Morgenstern (1944) and their 
followers. A person’s utility is determined by the preference that person exhibits 
for the choices available in circumstances involving risk. In fact, some authorities 
[such as Hammond (1967)] prefer the term preference theory in referring to the 
concept we shall call utility theoiy. 

The following sections show how to incorporate utility theory into a decision¬ 
making procedure. 

Suppose that we confront an executive with a situation in which she may expend 
$50,000 in one of two investments: (1) Investment A will yield a profit of $100,000 
with probability 0.5 and a loss of the $50,000 with probability 0.5. (2) Investment 
B will yield a profit of $10,000 with probability 1. 

For reasons that will soon become apparent, we need, at this point, to assign 
an arbitrary utility index to the amounts of $100,000 and —$50,000 specified in 
the gamble. The utility index is a subjective measure of the decision-maker’s 
preference for an outcome of some action. We’ll assign a utility index of 0 to 
— $50,000 and a utility index of 1 to $100,000. We express this symbolically as 
U($ 100,000) = 1, and U(- $50,000) = 0. We could have used any other num¬ 
bers, so long as the index for $100,000 was greater than the index for — $50,000. 

Now we ask the executive which investment she would prefer, A or B. Suppose 
that she prefers Investment B. We then ask for what probability of receiving 
$100,000 her choice would be Investment A. She may tell us that if the probability 
of a profit of $100,000 were 1 (and consequently the probability of losing $50,000 
were 0) she would prefer Investment A. We then try to determine whether there 
is some lesser probability for which she would take the gamble. In fact, we try 
to find some probability for which she is indifferent. The point of indifference is 
represented by the probability figure p such that—for any probability greater than 
p —the decision-maker would prefer the investment involving a gamble. For any 
probability less than p , she would prefer the investment with a certain profit. Let 
us suppose that the value of p is 0.85. We can now compute the decision-maker’s 
utility index for $10,000, the sure profit associated with Investment B. We can 
do this on the assumption that the point between the two choices at which the 
decision-maker is indifferent is the point at which the expected utilities of the 
choices are the same. Thus 

f/($10,000) = 0.85[f/($ 100,000)] + 0.15[t/(-$50,000)] 

= 0.85(1) + 0.15(0) = 0.85 

We persuade the executive to play the game once more. We ask her what value 
of p would make her indifferent between the gamble and a sure profit of $50,000. 
Suppose that she says that p in this case is 0.95. From this information, we 
calculate the utility of $50,000 to be 



ThG Utility Function 


FIGURE 15.4.1 
Utility-money 
graph for Example 
15.4.1 


U($ 50,000) = 0.95(1) + 0.05(0) = 0.95 

What if we change the consequences of Investment B so that now this act yields 
a certain loss of $20,000? She must now arrive at a value of p for the gamble. 
She must tell us what likelihood of realizing a profit of $100,000 she would require 
before she would be indifferent between choosing the sure loss of $20,000 and 
taking the gamble. We would expect the value of p to be lower than for the 
situations mentioned previously. Assume that she says that the probability of 
getting the $100,000 on the gamble would have to be 0.52. The utility for - $20,000 
is then 


U(- $20,000) = 0.52(1) + 0.48(0) = 0.52 

In a similar manner, we can determine the decision-maker’s utility for any sum 
of money between —$50,000 and $100,000. 

The conversation with the decision-maker results in the following set of monetary 
amounts and their associated utilities: (-$50,000, 0), ($100,000, 1), ($10,000, 
0.85), ($50,000, 0.95), and (-$20,000, 0.52). We can plot these on a utility- 
money graph as shown in Figure 15.4.1. We may find additional points and 
connect them to produce a curve that corresponds to the decision-maker’s utility 
function. Assume that the utility curve for our decision-maker in this example is 
that in Figure 15.4.1. With the utility curve, we can determine her utility for any 
amount between -$50,000 and $100,000. We do this by locating the dollar 
amount on the horizontal axis of Figure 15.4.1, moving up to the curve, and 
moving across to the utility on the vertical axis. For example, the utility for 
$30,000 is 0.92. In general, a point on the vertical axis (a utility) means that the 
decision-maker is indifferent between (1) having for certain the amount of money 
on the horizontal axis below the point on the curve corresponding to that utility, 
and (2) taking a gamble involving the highest amount on the horizontal axis with 
probability equal to the utility value, and the lowest amount on the horizontal axis 




with probability equal to 1 minus the utility value. The decision-maker’s utility 
is equal to the probabilities we asked her to assign to the larger amount in the 
gamble. This is the convenience we achieved by assigning a utility index of 1 to 
$100,000 and 0 to -$50,000. 

Note that the utility curve generated by the decision-maker is not a straight line, 
but a curve that is concave downward. From this we may conclude that she tends 
to be conservative. This type of curve is characteristic of the risk avoider. A 
decision-maker whose utility curve is a straight line is known as an averages 
player, or a risk-neutral decision-maker. However, the decision-maker with a 
utility curve that is concave upward is one who tends to prefer the gamble to the 
sure thing. The gambler’s utility curve is in general concave upward. This type 
of decision-maker is called a risk preferer. Figure 15.4.2 illustrates these three 
types of utility curves. 

These are not the only types of utility curves that decision-makers can generate. 
There are many variations on these basic shapes. [See Swalm (1966) for exam¬ 
ples.] 

In general, a decision-maker wants to maximize expected utility rather than 
expected monetary value. However, when a decision-maker’s utility curve is lin¬ 
ear, the act that maximizes expected utility and the act that maximizes expected 
monetary value are the same. For those decisions in which the money involved 
is small relative to the assets of the organization, we can usually consider the 
utility curves for these decisions to be linear. Thus maximizing expected monetary 
value is usually a valid procedure. If, however, the money amounts are relatively 
large, we may have no basis for assuming a linear utility curve. In such cases, 
the decision-maker wants a utility curve that is specific to the situation and that 
maximizes expected utility. 


FIGURE 15.4.2 
Three types of 
utility curve 




Assumptions 
Underlying Utility 
Theory 


The validity of utility theory as we have presented it here depends on the following 
assumptions: 

1. When confronted with two alternatives such as those we have discussed, the 
decision-maker is either indifferent between the two, prefers the first to the second, 
or prefers the second to the first. 

2. When a decision-maker prefers alternative A to alternative B, and prefers 
alternative B to alternative C, then he or she prefers alternative A to alternative 
C. This is called transitivity of preference. 

3. When the decision-maker prefers a gamble to some outcome A when p - 1, 
but prefers A to the gamble when p — 0, then there exists some value of p such 
that the decision-maker is indifferent between the gamble and the outcome A. 
This is referred to as the continuity of preference assumption. 

4. When a decision-maker is indifferent as to the choice between two acts, then 
one may be substituted for the other, and the utilities of the two acts can be 
considered equal. This is known as the principle of substitution. 

5. When two gambles have identical payoffs, but the more attractive payoff of 
one has a probability different from the more attractive payoff of the other, the 
decision-maker will prefer the gamble with the highest such probability. 

Prescriptive Versus Descriptive Role of Utility Theory Some authorities—for ex¬ 
ample Swalm (1966)—feel that utility theory is well suited to describing the 
behavior of the decision-maker. Thus the proper use of utility theory lets one 
predict how a decision-maker will behave in a given situation. On the other hand, 
there are those who see utility theory as directing or prescribing the way a deci¬ 
sion-maker should behave. This concept of the role of utility theory is found in 
the article by Hammond (1967). 

[The articles by Swalm (1966) and Hammond (1967) are easily understood, 
concise introductions to utility theory. The article by Fishbum (1968) is also good, 
and somewhat more advanced; it also has a useful bibliography. The following 
authors’ textbooks cover the topic and are quite readable: Dyckman et al. (1969), 
Baumol (1972), Schlaifer (1981), and Bierman et al. (1977). More advanced 
treatments may be found in Luce and Raiffa (1957) and von Neumann and Mor- 
genstem (1980).] 


15.5 BAYESIAN DECISION THEORY AND CLASSICAL 
STATISTICAL INFERENCE 

Bayesian decision theory and classical inference both have the same objective— 
guiding a decision-maker to choose the best of two or more courses of action. 
There are, however, some differences in their methods and their philosophical 
foundations. Some of the points on which there is a difference of opinion between 
the classicists and the Bayesians include the following. 



Summary 


1. Subjective probability. Bayesian decision-makers readily use subjective prob¬ 
abilities whenever they feel the situation requires them. A frequent source of prior 
probabilities is the decision-maker’s best evaluation of the likelihood of certain 
events. Classicists cannot accept the validity and usefulness of subjective proba¬ 
bilities. They require that probabilities be generated by objective means. 

2. The choice of the significance level in hypothesis testing. Advocates of the 
Bayesian approach contend that classicists often arbitrarily select a significance 
level without putting much thought into the underlying assumptions or conse¬ 
quences. 

3. Costs. The Bayesians also criticize the classicists because they do not incor¬ 
porate cost considerations into the hypothesis-testing and estimation procedures. 

This chapter introduced some fundamental notions in statistical decision theory. 
In addition to the references already mentioned, the article by Hirschleifer (1961) 
is recommended for the novice in this area. The following papers report on the 
use of statistical decision theory in several areas. 

In the area of marketing research, Green (1964) presents the decision-theory 
approach as an alternative to a classical procedure in distinguishing between highl¬ 
and low-sales-potential customers. Other marketing-oriented articles are those by 
Green (1963), Roberts (1963), and Barker (1972). 

Corless (1972), Sorensen (1969), and Tracy (1969a, 1969b) have contributed 
papers on the application of Bayesian analysis to the field of accounting. Soren¬ 
sen’s article discusses sampling invoices for errors. The problem dealt with by 
Tracy is payroll errors. 

Phillips and Dawson (1968) show how a retail manager can use Bayesian anal¬ 
ysis to calculate order quantities and reorder points to improve inventory control. 
Taylor (1969) discusses the use of Bayesian analysis in solving the problem of 
the age at which to replace items in a stock of equipment. For applications to 
quality control, see the articles by Carter (1972) and Hamburg (1962). 

An article by Olson (1968) deals with the field of transportation. The paper by 
Owens (1968) deals with academic administration (the registration of student or¬ 
ganizations). 

Two articles by Borck (1968a, 1968b) deal with applications to operations 
research. If you are interested in real estate, an article by Ratcliff and Schwab 
(1970) gives a general introduction to Bayesian decision theory followed by a 
model of a real-estate investment decision. Dowds (1972) shows the application 
of Bayesian decision methods to an oil-well-drilling project. 

Brown (1970) wanted to find the extent to which business people actually do 
use decision-theory analysis. He surveyed 20 companies that had been exploring 
various applications of decision-theory analysis, and concluded: 

1. Only a few companies had used decision-theory analysis for any length of 
time. 

2. Its use increased sharply between 1964 and the time of the survey (1969). 



Review Questions 



3. Companies using decision-theory analysis did not drastically change their gen¬ 
eral decision-making procedures as a result, although individual decisions had 
often been affected. 

4. There was no firm evidence to prove the widespread practical value of decision- 
theory analysis. 

5. The potential of the technique is great, but some major difficulties remain to 
be solved. 

1. Define: (a) payoff, (b) event, (c) state of nature, (d) payoff table. 

2. Explain: (a) the maximin criterion, (b) the minimax criterion, (c) the maximax 
criterion, (d) the Hurwicz criterion, (e) the Bayes criterion. 

3. Explain or define: (a) prior analysis, (b) preposterior analysis, (c) posterior anal¬ 
ysis, (d) expected profit under uncertainty, (e) expected profit with perfect information, 
(f) expected value of perfect information, (g) cost of uncertainty, (h) expected value of 
sample information. 

4. What is meant by utility theory? 

5. What is meant by utility index? 

6. How is a utility function constructed? 

7. How is a utility function used? 

8. Describe the utility curve of: (a) the risk avoider, (b) the risk-neutral decision¬ 
maker, (c) the risk preferer. 

9. What assumptions underlie utility theory? 

10. Locate in a business-oriented journal an article discussing an application of statistical 
decision theory. Write a critique of the article. 

11. Describe a situation in your area of interest in which Bayes’ criterion could be applied. 
Use realistic data, and carry out the calculations necessary to arrive at a decision. 

12. A small soft-drink company is about to begin operation, with distribution limited to 
a single state. The management wishes to know whether the company should use “no 
deposit-no return” bottles or returnable bottles. There is rumor that the state legislature 
may pass a law banning no-return bottles. If it does, and if the company has decided to 
use no-return bottles, the switch to returnable bottles will be expensive. Management’s 
preliminary payoff table is as follows. (Payoffs are in millions of dollars.) (a) Compute 
the expected payoff for each of the two acts, (b) Compute the EVPI. (c) What is the 
optimal act? (d) Which act has the higher expected payoff? (e) Without further analysis, 
which act would management follow? (f) What is the maximum amount management 
would pay for “sample” information? 

Act 


Event (£,) 


Use Use 

no-return returnable 

bottles bottles 

(Ay) (A 2 ) £(£ f ) 


Law is passed 
Law is not passed 


$10 

20 


$15 

15 


0.8 

0.2 


16 . Some Statistical 
Applications in 
Quality Control 

Chapter Objectives: In this chapter you become 
acquainted with an important area of business in 
which statistical methodology is used to great advan¬ 
tage. You will learn additional ways in which to apply 
the basic concepts and techniques that you learned in 
the preceding chapters. After studying this chapter 
and working the exercises, you should be able to do 
the following. 

1. Construct and use control charts for variables 

2. Construct and use control charts for attributes 

3. Develop and use acceptance sampling plans for 
attributes 

4. Develop and use acceptance sampling plans for 
variables 



16.1 INTRODUCTION 


The concept of quality control, in its broadest sense, has many facets. Quality 
control is a concern of most, if not all, of the areas of a business organization. 
In fact, the life and health of a business depend on the quality of the product it 
produces. It is no wonder, then, that management is concerned with measuring 
and controlling the quality of the product. 

Management should establish company policy on the quality of the product. 
The standard must be based on the performance of the product in actual use by 
customers. It must be at a level that is acceptable to customers, comparable to 
that of competitors, and economically feasible from the standpoint of cost of 
production and service. 

Design engineers must have quality in mind when they design a product. Those 
responsible for buying raw materials must keep an eye on quality. (The quality 
of the finished product depends on the quality of the raw materials from which it 
is made.) The accounting department watches over the cost of achieving high 
quality. The marketing department is deeply concerned with the quality of the 
product the company places on the market. Thus companies strive at all levels to 
obtain the “best” balance between cost and quality. 

This concern with good quality—and its measurement and control—is present 
in a business organization whether or not it has a formal quality-control depart¬ 
ment. When a business does have a quality-control department, however, it per¬ 
forms a variety of functions. It collects samples, takes measurements, performs 
tests, makes arithmetical and statistical computations, keeps records, prepares 
reports, and makes decisions. 

This chapter presents some of the statistical concepts and techniques that are 
used in the statistical component of quality control. Space does not permit an 
exhaustive cataloguing of the concepts and techniques, nor a complete treatment 
of those ideas that we do present. [For additional reading in this area of statistical 
application, see the books by Cowden (1957), Duncan (1974), Grant and Leav¬ 
enworth (1980), Hansen (1963), Knowler et al. (1969), Peach (1964), and Samson 
et al. (1970).] 


16.2 CONTROL CHARTS—VARIABLES 

Walter A. Shewhart of the Bell Telephone Laboratories, in a memorandum dated 
May 16, 1924, introduced the concept of the control chart. [See Edwards (1947).] 
Further development of the technique is chronicled in a number of articles by 
Shewhart (1926a, 1926b, 1927). In 1931 Shewhart published a landmark book in 
the area, Economic Control of Quality of Manufactured Product. 

The control chart is a decision-making device that gives the user information 
about the quality of product resulting from a manufacturing process. A control 
chart usually consists of three horizontal lines. The top line represents the upper 
control limit, the bottom line the lower control limit, and the center line an 




acceptable average for the process based on specifications or historical data. The 
control chart is constructed in such a way that we can plot the results of assessing 
the quality of the manufactured product through periodic monitoring of the man¬ 
ufacturing process. Each time the process is monitored, a point is placed on the 
control chart. As long as the points fall within the two control limits, we do not 
question the quality of the product. But when a plotted point falls outside the 
control limits, this alerts the production manager to the possibility that the quality 
of the product is unacceptable. 

Figure 16.2.1 shows a basic control chart. 

In any manufacturing process, not all items produced are exactly alike. The 
various forces and conditions of the manufacturing environment result in variations 
from item to item in any measurement that we can take. There are two types of 
variations in quality control-variation due to chance and variation due to some 
assignable cause. Once they are constructed, we consult control charts from time 
to time to determine the type of observable variation present. If the control chart 
indicates that the observed variation is due to chance alone, we say that the process 
is “in control.” On the other hand, if the control chart indicates that the observed 
variation is not due to chance—that is, if it indicates that some other cause seems 
to be operating—we conclude that the process is “out of control.” In this event 
we halt the process. Then we look for and correct possible causes. 

We keep separate variable-control charts for each variable (measurement) of 
interest. We use them to indicate whether, for these variables, the manufacturing 
process is in control with respect to the population mean and population dispersion. 
A control chart for the mean is called an x chart. When control of dispersion is 
the objective, the measure used is the range. The resulting chart is called an R 
chart. 

When we construct control charts for detecting a shift in population dispersion, 
we use the range, rather than the standard deviation of the sample. The reason is 
that the range is easier to compute. Furthermore, persons not trained in statistics 

FIGURE 16.2.1 
A basic control 
chart 





understand the range better than the standard deviation. Burr (1953) suggests that 
standard deviations be used when n > 10 and/or when each measurement is 
comparatively expensive and we want the maximum information from the data. 
Remember that the construction of R charts as presented here assumes that the 
distribution of values making up the population is normal. However, good results 
have been reported even when this assumption does not hold. 

The basic steps in constructing and utilizing variable-control charts are as fol¬ 
lows. 

1. Select a random sample of n items from the manufacturing process and take 
measurements x v x 2 , . . ., x n . 

2. Compute the sample mean and range as follows: 

- = Skii „ = _ _ r 

n 

3. If you feel that the process is stable, select k successive samples and compute 
the following values: 

5 = R = 

k k 

You usually choose a value of k between 20 and 30. The value of x, which is an 
estimate of ijl- = /x, becomes the center horizontal line on the x chart, and R 
becomes the center horizontal line on the R chart. 

4. Compute lower and upper control limits for x as follows: 

LCL- = x - A 2 R, UCL- = x + A 2 R 

You can find the constant A 2 in Table 16.2.1. 

Remember these essential principles of the sampling distribution of sample 
means: 

(a) When the population from which the samples are drawn is normally distrib¬ 
uted, the distribution of sample means is normal, with a mean /jlj equal to the 
population mean and a standard deviation equal to cr/Vn. 

(b) If the sample size is large, the central limit theorem tells us that—regardless 
of the functional form of the parent population (provided that it has a mean and 
a finite variance)—the distribution of sample means that we can compute from 
samples drawn from such a population is approximately normal, with /jlj = /x 
and <r- = a/y/n. Although a sample size of about 30 is generally thought nec¬ 
essary for a satisfactory approximation, some people use the approximation for 
samples as small as 5 or 10. [See Grant and Leavenworth (1980).] 

(c) Approximately 68% of the area lies under the curve of f(x) and above the x 
axis between the points /Uj. ± 1 cr^. Approximately 95% of the area lies between 
the points jtx T ± 2crAnd approximately 99.7% of the area lies between the points 



TABLE 16.2.1 

Factors useful in 
the construction 
of control charts 

n 

CHART FOR AVERAGES 


CHART FOR RANGES 


*3 

Factor for 
control limit 
*2 

Factor for 
central line 

d 2 

Factors for 
control limits 

D 3 D a 


2 

1.880 

1.128 

0 

3.267 

0.8525 


3 

1.023 

1.693 

0 

2.575 

0.8884 


4 

0.729 

2.059 

0 

2.282 

0.8798 


5 

0.577 

2.326 

0 

2.115 

0.8641 


6 

0.483 

2.534 

0 

2.004 

0.8480 


7 

0.419 

2.704 

0.076 

1.924 

0.833 


8 

0.373 

2.847 

0.136 

1.864 

0.820 


9 

0.337 

2.970 

0.184 

1.816 

0.808 


10 

0.308 

3.078 

0.223 

1.777 

0.797 


11 

0.285 

3.173 

0.256 

1.744 

0.787 


12 

0.266 

3.258 

0.284 

1.716 

0.778 


13 

0.249 

3.336 

0.308 

1.692 

0.770 


14 

0.235 

3.407 

0.329 

1.671 

0.762 


15 

0.223 

3.472 

0.348 

1.652 

0.755 


16 

0.212 

3.532 

0.364 

1.636 

0.749 


17 

0.203 

3.588 

0.379 

1.621 

0.743 


18 

0.194 

3.640 

0.392 

1.608 

0.738 


19 

0.187 

3.689 

0.404 

1.596 

0.733 


20 

0.180 

3.735 

0.414 

1.586 

0.729 


21 

0.173 

3.778 

0.425 

1.575 

0.724 


22 

0.167 

3.819 

0.434 

1.566 

0.720 


23 

0.162 

3.858 

0.443 

1.557 

0.716 


24 

0.157 

3.895 

0.452 

1.548 

0.712 


25 

0.153 

3.931 

0.459 

1.541 

0.709 


Values of d 2 and d 3 are from E, S. Pearson, "The Percentage Limits for the Distribution of Range 
in Samples from a Normal Population,” Biometrika, XXIV (1932), p. 416. Used by permission of 
the Biometrika trustees. 


A 2 = 3/(d 2 Vn)'D 3 = 1 - 3 (d 3 /d 2 ), D 4 = 1 + 3 (d 3 /d 2 ) 


fjij. ± 3cr-. These points are referred to as the 1, 2, and 3 sigma limits on x. The 
probability that a single sample picked at random from the population will yield 
an 3c within these limits is equal to the area under the curve between the points 
defining the limits. In control limits for 3c, A 2 R is an estimate of 3<x-. Therefore 
the control limits specified above are called the 3 sigma control limits for x. 

5. Compute the lower and upper 3 sigma control limits for R as follows: 

LCL* = D 3 R, UCL* = D 4 R 
Values for D 3 and D A are found in Table 16.2.1. 

6. Compute lower and upper 3 sigma process tolerance limits for individual values 
of * as follows: 

ITT = TTXT = 3 ^ 

LTL t — x -—, UTL V = x H- 

«2 d 2 

Again the constant d 2 is found in Table 16.2.1. 



These limits are usually called the natural tolerance limits for the process. When 
a process is in good control, these limits include almost all the individual values. 
If these limits are within the manufacturer’s specifications, we may conclude that 
most of the items are of satisfactory quality. If they are outside the manufacturer’s 
specifications, we question the quality of the product. 

In constructing x and R charts, the most frequently used sample size is either 
4 or 5. [Duncan (1974) summarizes the arguments in favor of these choices of 
sample size.] The following example illustrates the construction and use of 3c and 
R charts. 


EXAMPLE 16.2.1 Consider a manufacturing process that produces 20 bolts every 
minute. Table 16.2.2 shows the measurements of each bolt in terms of the devia¬ 
tion above or below 3.000 inches. We use the first 25 time periods to establish 
control limits. We draw a random sample of size 4 from each time period. We 
then calculate the mean and range for each sample. Table 16.2.3 shows the results. 

From the data in Table 16.2.3, we compute the following: 


= 1.0 + 0.75 + • • • + (-1.0) 5.25 „ ^ 

* =-5- = IT = °- 21 

- 7 + 11 + • • • + 4 236 „ „„ 

S * -5- ' 15 = 9 « 

LC4 = 0.21 - 0.729(9.44) = 0.21 - 6.88 = -6.67 

UCL* = 0.21 + 0.729(9.44) = 7.09 

LCL* = (0)(9.44) = 0, UCL* = (2.282)(9.44) = 21.54 


Figure 16.2.2 shows the resulting control charts, along with the plotted values of 
3c and R computed from the first 25 samples. 

None of the sample values fall outside the control limits. This indicates that 
the process is under control during this period. Therefore we can use the central 
lines and control limits of these charts as a standard for future reference. Let us 
now continue to draw samples of size 4 from each time period and observe whether 
or not the process continues to be in control. We sample time periods 26 through 
40 (shown in Table 16.2.2). Table 16.2.3 shows the mean and range for each of 
these samples. Figures 16.2.3 and 16.2.4 show the locations of the sample values 
of x and R on the control charts. 

Samples 29, 35, and 38 produce means that fall outside the control limits. We 
assume that each mean falling outside the control limits was considered to be an 
indication that the process mean had changed. We assume that the process v/as 
stopped each time, and that an assignable cause for the change was found and 
corrected. 

The process variation seems to have changed at the time of drawing sample 
number 38, since this sample value of R falls outside the control limit for R. 
Again, we assume that an assignable cause was found and corrected. 



TABLE 16.2.2 
Lengths of 1000 
consecutively 
produced bolts 
(measurements 
represent deviations 
in thousandths of 
an inch above or 
below 3.000 inches) 


Bolt number 


Time 


period 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

1 

7 

3 

4 

4 

-5 

5 

1 

11 

-2 

-8 

1 

-1 

4 

-5 

-3 

0 

6 

9 

4 

1 

2 

6 

7 

1 

0 

-1 

-7 

-7 

3 

5 

-4 

5 

0 

-2 

-1 

-1 

2 

-9 

5 

1 

1 

3 

-1 

-11 

- 1 

-1 

-1 

6 

— 7 

-4 

-2 

- 9 

-14 

1 

6 

-10 

-1 

-2 

-2 

-2 

0 

-1 

4 

-9 

-2 

-4 

5 

2 

5 

-1 

5 

5 

-9 

-3 

-8 

0 

0 

6 

1 

6 

-5 

3 

11 

5 

8 

11 

-2 

0 

4 

-3 

-1 

9 

0 

-1 

-8 

-11 

6 

8 

-4 

10 

1 

2 

-2 

5 

6 

-2 

2 

4 

-4 

3 

-4 

-8 

3 

4 

4 

3 

5 

-5 

-2 

-3 

0 

-10 

-4 

1 

-1 

7 

-7 

13 

0 

7 

0 

4 

-6 

-3 

- 2 

-7 

-3 

-2 

0 

3 

10 

0 

-8 

2 

-1 

-2 

8 

-8 

3 

9 

-2 

3 

12 

-5 

-6 

-14 

-1 

0 

9 

4 

-1 

1 

-5 

4 

-9 

2 

-5 

9 

-1 

-4 

-1 

-3 

4 

-4 

8 

-1 

-2 

-5 

- 4 

-7 

-7 

-5 

-5 

5 

-2 

3 

-4 

0 

10 

-7 

-9 

-3 

0 

1 

-3 

2 

1 

-2 

0 

5 

-5 

-7 

-5 

0 

7 

2 

2 

-2 

3 


11 

-4 

-12 

3 

-3 

-6 

-5 

-12 

5 

-3 

3 

5 

8 

2 

-6 

2 

2 

-10 

-2 

0 

-4 

12 

-5 

4 

2 

-2 

3 

8 

4 

3 

3 

1 

4 

1 

1 

5 

3 

-2 

6 

0 

3 

-11 

13 

-10 

12 

5 

3 

7 

8 

-9 

3 

3 

-2 

10 

0 

4 

-5 

6 

-3 

0 

5 

3 

5 

14 

3 

-1 

-5 

3 

-7 

-4 

4 

-2 

-1 

-2 

0 

10 

-5 

-5 

5 

-2 

1 

-7 

4 

4 

15 

-7 

-1 

5 

1 

-6 

13 

-3 

6 

-1 

-1 

9 

5 

1 

-3 

2 

-2 

-4 

-6 

-3 

-7 

16 

-5 

-5 

6 

8 

-2 

10 

6 

1 

4 

5 

-16 

-2 

-3 

-1 

3 

1 

1 

5 

-3 

3 

17 

-2 

-7 

4 

0 

2 

-10 

-2 

3 

5 

8 

-8 

-2 

-6 

0 

-2 

9 

7 

-2 

-1 

8 

18 

7 

0 

-4 

6 

8 

5 

-2 

2 

0 

2 

3 

1 

-3 

0 

-3 

0 

3 

2 

-4 

1 

19 

-6 

-4 

10 

3 

3 

3 

5 

-6 

0 

6 

5 

7 

7 

-5 

-5 

-2 

-2 

-1 

2 

-1 

20 

0 

6 

5 

13 

1 

6 

-7 

1 

6 

2 

7 

-8 

0 

1 

-3 

-3 

-2 

0 

-2 

4 

21 

-6 

3 

2 

2 

1 

-2 

-4 

-12 

3 

2 

-6 

6 

4 

-8 

7 

-5 

-11 

4 

2 

-2 

22 

3 

8 

2 

-9 

12 

-4 

0 

-6 

-1 

2 

-2 

1 

5 

-7 

1 

2 

0 

1 

6 

-6 

23 

0 

4 

2 

-7 

-1 

4 

-3 

6 

9 

-6 

5 

5 

4 

-4 

-9 

-8 

-3 

-1 

-3 

0 

24 

-5 

-2 

0 

0 

12 

-1 

-6 

-8 

4 

5 

8 

-7 

4 

-3 

4 

2 

3 

3 

10 

0 

25 

-9 

8 

-1 

-5 

2 

-2 

-4 

1 

2 

1 

-1 

7 

6 

1 

0 

1 

-3 

11 

3 

8 

26 

-5 

-5 

-1 

-2 

-7 

-4 

-1 

-1 

2 

6 

-3 

0 

1 

2 

10 

1 

-2 

-2 

-4 

9 

27 

-1 

-5 

4 

2 

-5 

6 

-2 

9 

-12 

-2 

2 

2 

2 

13 

10 

2 

-2 

1 

-4 

0 

28 

-12 

11 

-2 

-5 

-2 

7 

-1 

-3 

-3 

10 

4 

-5 

1 

7 

-1 

2 

-5 

-3 

-7 

-3 

29 

9 

10 

13 

8 

12 

2 

6 

16 

10 

10 

13 

1 

14 

10 

9 

4 

18 

9 

12 

21 

30 

-10 

4 

-2 

-14 

1 

-5 

7 

3 

-4 

-6 

6 

3 

-9 

1 

6 

4 

4 

-2 

0 

-3 

31 

4 

-5 

-4 

0 

7 

0 

-1 

-12 

-8 

-3 

2 

-3 

1 

-8 

-3 

16 

4 

-5 

-3 

-2 

32 

5 

-2 

-1 

-2 

5 

2 

-3 

2 

7 

-1 

7 

-1 

3 

3 

-1 

-7 

-10 

-3 

-4 

-3 

33 

1 

-3 

0 

2 

-8 

-3 

0 

1 

-4 

-1 

-2 

-2 

1 

0 

-1 

5 

1 

-3 

-3 

4 

34 

-11 

-1 

1 

8 

3 

-2 

1 

-3 

9 

3 

-6 

-7 

4 

-10 

0 

-1 

5 

11 

17 

-1 

35 

0 

-18 

-11 

-4 

-17 

-2 

-14 

-14 

-11 

-11 

-10 

-9 

-9 

-12 

-8 

-5 

-16 

-16 

-7 

-5 

36 

1 

10 

0 

5 

-3 

1 

-5 

5 

-2 

-4 

-1 

5 

-9 

-7 

11 

10 

-4 

4 

-2 

5 

37 

8 

2 

2 

-2 

-1 

4 

-4 

-3 

4 

0 

-4 

-10 

-4 

-3 

5 

3 

1 

2 

9 

14 

38 

24 

8 

-35 

14 

-25 

-1 

-11 

-34 

36 

-46 

-13 

7 

20 

-26 

23 

6 

9 

-21 

-13 

1 

39 

-4 

13 

0 

5 

4 

4 

-6 

1 

8 

2 

6 

-3 

1 

3 

0 

1 

1 

2 

3 

-6 

40 

5 

-1 

-3 

-3 

2 

6 

-3 

6 

1 

8 

2 

3 

4 

-3 

6 

1 

-1 

5 

-6 

-8 

41 

-20 

14 

-19 

3 

-34 

32 

68 

6 

-33 

-27 

-60 

17 

12 

5 

-6 

4 

-29 

24 

0 

3 

42 

-2 

3 

4 

0 

-6 

-7 

9 

-3 

7 

1 

2 

-6 

1 

-1 

-1 

2 

-5 

-4 

4 

5 

43 

3 

-4 

-3 

4 

-4 

3 

-9 

-1 

6 

1 

-1 

4 

-3 

-7 

2 

3 

-9 

-7 

-2 

-6 

44 

15 

48 

27 

-13 

34 

-49 

13 

-41 

-1 

-15 

15 

-12 

-40 

7 

12 

-12 

-26 

4 

-20 

-13 

45 

-4 

-7 

0 

-2 

-4 

-5 

3 

-4 

5 

-1 

4 

-2 

-5 

-3 

5 

-3 

-3 

-1 

0 

2 

46 

-3 

-2 

11 

-4 

-5 

6 

-9 

0 

-1 

-1 

-3 

-2 

3 

6 

-9 

1 

-3 

0 

6 

0 

47 

0 

-9 

-6 

-13 

-11 

-17 

-4 

-11 

-15 

-6 

-17 

-8 

-2 

-14 

0 

-15 

-9 

-5 

-7 

-10 

48 

-8 

1 

3 

5 

-2 

1 

2 

7 

5 

-7 

4 

7 

2 

-6 

0 

-5 

1 

-1 

7 

-3 

49 

9 

-2 

6 

15 

11 

7 

10 

11 

-1 

4 

13 

12 

14 

6 

10 

9 

-2 

11 

5 

5 

50 

4 

-4 

4 

1 

6 

3 

-1 

-2 

-1 

-2 

1 

8 

-1 

8 

-1 

-5 

3 

1 

0 

8 




TABLE 16.2.3 
Mean and range 
computed from 
random samples 
drawn from time 
periods shown in 
Table 16.2.2 


Sample 

number 

*i 

*2 

*3 

*4 

Total 

X 

ft 

1 

5 

0 

-2 

1 

4 

1.0 

1 

2 

1 

-1 

7 

-4 

3 

0.75 

11 

3 

-14 

-1 

1 

-1 

-15 

-3.75 

15 

4 

-1 

i 5 

1 

-3 

2 

0.5 

8 

5 

11 

4 

9 

-3 

21 

5.25 

14 

6 

-2 

4 

0 

-3 

-1 

-0.25 

7 

7 

-2 

0 

2 

0 

0 

0.0 

4 

8 

3 

-6 

3 

-2 

-2 

-0.5 

9 

9 

5 

-7 

-3 

-4 

-9 

-2.25 

12 

10 

1 

-3 

2 

-7 

-7 

-1.75 

9 

11 

-12 

-12 

-5 

5 

-24 

-6.0 

17 

12 

-2 

3 

3 

-2 

2 

0.5 

5 

13 

5 

8 

0 

0 

13 

3.25 

8 

14 

-5 

3 

-5 

-7 

-14 

-3.5 

10 

15 

-3 

9 

-2 

2 

6 

1.5 

12 

16 

3 

5 

-5 

10 

13 

3.25 

15 

17 

-2 

2 

0 

-7 

-7 

-1.75 

9 

18 

-3 

6 

0 

-3 

0 

0.0 

9 

19 

0 

-2 

2 

7 

7 

1.75 

9 

20 

-7 

6 

13 

0 

12 

3.0 

20 

21 

7 

4 

2 

3 

16 

4.0 

5 

22 

2 

2 

1 

0 

5 

1.25 

2 

23 

-3 

5 

0 

4 

6 

1.5 

8 

24 

-3 

2 

0 

-5 

-6 

-1.5 

7 

25 

0 

1 

-2 

-3 

-4 

-1.0 

4 

26 

2 

-5 

10 

-1 

6 

1.5 

15 

27 

2 

0 

10 

13 

25 

6.25 

13 

28 

-5 

7 

1 

-1 

2 

0.5 

12 

29 

9 

18 

4 

1 

32 

8.0 

17 

30 

4 

-5 

3 

1 

3 

0.75 

9 

31 

1 

-5 

4 

-1 

-1 

-0.25 

9 

32 

5 

-1 

2 

-1 

5 

1.25 

6 

33 

1 

-2 

1 

-1 

-1 

-0.25 

3 

34 

-6 

-1 

8 

-1 

0 

0.0 

14 

35 

-18 

-14 

-11 

-16 

-59 

-14.75 

7 

36 

4 

-1 

5 

5 

13 

3.25 

6 

37 

9 

-4 

-4 

0 

1 

0.25 

13 

38 

-34 

6 

-26 

-1 

-55 

-13.75 

40 

39 

0 

13 

1 

2 

16 

4.00 

13 

40 

4 

-1 

-3 

2 

2 

0.50 

7 

41 

24 

3 

-34 

12 




42 

-2 

-4 

2 

-3 




43 

-9 

-1 

2 

3 




44 

27 

-13 

-49 

-12 




45 

-7 

0 

3 

-4 




46 

-9 

-3 

6 

-9 




47 

-9 

-17 

-13 

-10 




48 

2 

-2 

-3 

-1 




49 

7 

6 

9 

11 




50 

8 

1 

4 

3 






FIGURE 16.2.2 
Control charts for x 
and R, using data 
from Tables 16.2.2 
and 16.2.3 


Exercises 



Sample number 


(a) Control chart for x 


Sample number 


fb} Control chart for R 


16.2.1 Using the sample values in Table 16.2.3, compute the mean and range for samples 
41 through 50. Plot them on the x and R charts that were constructed from the first 25 
samples of the table. Explain the results. 

16.2.2 Using the data of Table 16.2.2, draw samples of size 5 from each of the first 25 
time periods. Construct an x and R chart based on these samples. 

16.2.3 Draw samples of size 5 from each of the remaining time periods. Compute the 
same mean and range for each. Plot them on the control charts. 


16.3 CONTROL CHARTS—ATTRIBUTES 

Sometimes it is not possible to determine the quality of an item of product by 
measuring such things as length, weight, or temperature. Alternatively, we can 
classify the quality of an item of product as either acceptable or nonacceptable. 




FIGURE 16.2.3 
Control chart for x, 
showing location 
of 5? for samples 26 
through 40, as 
shown in Table 
16.2.3 


FIGURE 16.2.4 
Control chart for R, 
showing location 
of R for samples 26 
through 40, as 
shown in Table 
16.2.3 


In other words, the attribute of the product determines its quality. We may con¬ 
sider an item that has the attribute of being defective as nonac cep table. We may 
consider an item that has the attribute of being without defect as acceptable. 

Two types of control charts based on attributes are generally used. These are 
the p chart for fraction defective and the c chart for number of defects per item. 
We consider only the p chart in this text. [For a discussion of the c chart, see the 
books by Cowden (1957), Duncan (1974), Grant and Leavenworth (1980), Hansen 
(1963), and Peach (1964).] 



In constructing p charts, we define the value of p as 

_ number of defective items in a sample 
P total number of items in the sample 

and 


_ total number of defective items in several samples 
P total number of items inspected from several samples 

Corresponding to the statistic p, we call the fraction of defective items in the 
population p'. We may use p to estimate p'. 

Recall some basic facts about the sampling distribution of p: 

1. When we draw samples of size n from a large finite population with a fraction 
of defective items equal to p', the sampling distribution of p, the fraction of 
defective items observed in the samples, is distributed as the binomial probability 
distribution. We find the probability of observing in a sample a fraction of defec¬ 
tive items equal to p by evaluating 


f(p) 


(np)\(n - np) 


-{p') np ( 1 - p ') n ~ np , 


0 1 2 

5 5 5 

n n n 


n 

n 


2. The mean of all possible sample values of the fraction of defective items is 
equal to /?', the fraction of defective items in the population. 

3. The standard deviation of the distribution of sample p values is given by 



To estimate cr p , we use 


m - p) 


4. When n is large and neither p' nor 1 - p’ is too close to 1, the distribution 
of sample p values is closely approximated by a normal distribution. We generally 
consider these conditions to be met when both np’ and n( 1 - p') > 5. 


The following example shows how to construct a p chart. 


EXAMPLE 16.3.1 Consider a factory that produces plastic toy cars. The process is 
a continuous one. Samples of size 50 are taken periodically from the output and 
inspected. It is felt that the process is currently under control, since much effort 
has gone into training the operators, adjusting the machines, and ensuring that 
only high-quality raw material is used. Table 16.3.1 shows the number of defective 
items discovered in the first 25 samples taken from the output after these efforts 
to improve the quality. We wish to see whether the process is, in fact, now in 
control. 

A preliminary control chart will aid in this decision. The first step in construct¬ 
ing the preliminary control chart for the fraction of defective items is to compute 
p, the average fraction of defective items found in 25 samples taken from the 
existing process. 



TABLE 16.3.1 
Number of 
defective items 
observed in 25 
samples of 
size 50 from a 
manufacturing 
process 


Time period Number of 

(sample number) defective items 

P 

Time period 
(sample number) 

Number of 
defective items 

P 

1 

12 

0.24 

14 

9 

0.18 

2 

12 

0.24 

15 

9 

0.18 

3 

22 

0.44 

16 

22 

0.44 

4 

12 

0.24 

17 

3 

0.06 

5 

6 

0.12 

18 

11 

0.22 

6 

12 

0.24 

19 

9 

0.18 

7 

13 

0.26 

20 

9 

0.18 

8 

12 

0.24 

21 

11 

0.22 

9 

13 

0.26 

22 

12 

0.24 

10 

7 

0.14 

23 

10 

0.20 

11 

10 

0.20 

24 

24 

0.48 

12 

8 

0.16 

25 

10 

0.20 

13 

13 

0.26 

Total 

291 



total number of defective items 
total number of items inspected 


291 

1250 


0.2328 


We now use p as an estimate of p ', the fraction of defective items, and use it as 
the center line of our preliminary control chart. 

To obtain our 3<j p control limits, we add to and subtract from 0.2328 the 
quantity 


3d> = 3 


M 1 “ P) 


f (0.2328)(0.7672) 


50 


= 0.1793 


We now have 


LCL = 0.2328 - 0.1793 = 0.0535, 
UCL = 0.2328 + 0.1793 = 0.4121 


From these data we construct the preliminary control chart shown in Figure 16.3.1. 
A total of three points are outside the control limits on the chart. We conclude 
from this that the process is not under control. 

Our objective is to get the process under control. Thus we investigate the points 
that fell outside the control limits to look for some assignable cause. Let us assume 
that we found the following causes for the points that were out of control: During 
the time period in which sample 3 was drawn, several employees were out sick. 
Employees from another department were brought in as temporary replacements. 
The supervisor feels that this caused the large proportion of defective items in 
sample 3. During the period in which sample 16 was drawn, a machine got out 
of adjustment and caused a larger-than-usual proportion of defective items to be 
produced. Sample 24 was taken just after a new machine was installed and was 
being broken in. 

As a result of these investigations, we decide to discard the data for which we 
found an assignable cause. We compute a new value of p and new control limits 
based on the remainder of the data. These revised values are as follows: 




FIGURE 16.3.1 
A p chart for 
analyzing past data 


Preliminary 
UCL-0.4121 


Revised 
Xl CL—0.3733 


Preliminary 
p—0.2328 
Revised 
p=0.2027 


Preliminary 
LCL = 0.0535 
•Revised 
LCL=0.0321 


Sample number 


0.2027 


(0.2027)(0.7973) 


0.2027 + 0.1706 = 0.3733 


UCL = 0.2027 + 3 


LCL = 0.2027 - 0.1706 = 0.0321 


Figure 16.3.1 shows these revised values as the revised UCL and LCL, respec¬ 
tively. No sample values, other than those for which an assignable cause was 



TABLE 16.3.2 
Sample data from 
time periods 26-50 


Exercises 


Time period Number of 

(sample number) defective items 

P 

Time period 
(sample number) 

Number of 
defective items 

P 

26 

10 

0.20 

39 

8 

0.16 

27 

19 

0.38 

40 

6 

0.12 

28 

5 

0.10 

41 

13 

0.26 

29 

8 

0.16 

42 

20 

0.40 

30 

12 

0.24 

43 

9 

0.18 

31 

9 

0.18 

44 

8 

0.16 

32 

11 

0.22 

45 

5 

0.10 

33 

8 

0.16 

46 

14 

0.28 

34 

7 

0.14 

47 

1 

0.02 

35 

11 

0.22 

48 

9 

0.18 

36 

16 

0.32 

49 

11 

0.22 

37 

11 

0.22 

50 

10 

0.20 

38 

7 

0.14 





found, fall outside the new limits. We take these new limits, along with the new 
center line, as standards for controlling production in the future. 

Table 16.3.2 shows the sample results for the time periods 26 through 50. We 
plot the values of p on the revised control chart as shown in Figure 16.3.2. Three 
points fall outside the control limits, two above and one below. When we see 
these outlying points, we look for an assignable cause. Assume that for each of 
these values of p falling outside the control limits, we find an assignable cause 
and make the necessary corrections. We are especially interested in the value that 
falls below the lower control limit. Perhaps the cause is one that we can use to 
bring about a general reduction in the fraction of defective items in the factory’s 
output. 

We have considered only the case in which the sample size n is constant. Biut 
at times we may need to have a p chart for a variable n. Furthermore, even when 
points do not fall outside the control limits, in actual practice their behavior within 
the limits is watched. In particular, if a large number of consecutive points appears 
to fall either above or below the center line, the quality-control manager investi¬ 
gates the situation. There are tests to determine whether or not such a pattern 
indicates a real trend. For further consideration of this point, and of the case in 
which the sample size is not constant, see the previously cited texts. 


16.3.1 The daily production of window air conditioners is 1000 units. A random sample 
of 50 units is inspected each day. After 28 days a quality-control technician finds a total 
of 182 defective units. Assume that the production process is in control. Compute the 
center line and 3 sigma control limits for a p chart. 

16.3.2 The following table shows the number of defective items in 25 samples of size 50 
drawn from a manufacturing process. One sample was drawn from each of 25 time periods, 
(a) Use these data to construct a p chart for analyzing past data, (b) Plot the 25 values of 
p on the chart. Suggest possible causes for any values lying outside the control limits. 




FIGURE 16.3.2 
Revised p chart, 
showing values of 
p for samples 26 
through 50 



Time period 
(sample number) 

Number of 
defectives 

Time period 
(sample number) 

Number of 
defectives 

1 

10 

14 

11 

2 

10 

15 

12 

3 

11 

16 

3 

4 

10 

17 

20 

5 

12 

18 

10 

6 

10 

19 

7 

7 

12 

20 

21 

8 

19 

21 

12 

9 

9 

22 

9 

10 

7 

23 

9 

11 

8 

24 

12 

12 

8 

25 

10 

13 

7 




16.3.3 (Refer to Exercise 16.3.2) (a) Discard the data giving rise to points outside the 
control limits (if any). Construct a revised p chart, (b) Compute values of p from the data 
in the following table and plot the values on the revised p chart. Does the process remain 
in control? 


Time period 
(sample number) 

Number of 
defectives 

Time period 
(sample number) 

Number of 
defectives 

Time period 
(sample number) 

Number of 
defectives 

26 

6 

36 

10 

46 

10 

27 

10 

37 

13 

47 

9 

28 

7 

38 

18 

48 

20 

29 

19 

39 

9 

49 

6 

30 

4 

40 

12 

50 

10 

31 

8 

41 

6 



32 

11 

42 

3 



33 

7 

43 

11 



34 

9 

44 

8 



35 

5 

45 

9 




16.4 ACCEPTANCE SAMPLING FOR ATTRIBUTES 

During the production of any item, from time to time questions of acceptability 
arise. Is the incoming raw material acceptable? After the first stages of manufac¬ 
ture are completed, is the semifinished product acceptable for further processing? 
Is the quality of the outgoing product acceptable? The logical response to such 
questions is to inspect the items and find out. The questions and the response 
imply that: (1) the product can be inspected, (2) we have established criteria 
whereby we can classify an item of product as either acceptable or not, and (3) 
the results of an inspection will lead to some type of action. 

Suppose that we decide to inspect a product at one of the stages mentioned 
above, to find out whether it is acceptable. We can follow one of several paths. 
First, we decide whether to inspect 100% of the product or only a part of it. 
Typically we decide on the latter. That is, we decide to sample. Sampling requires 
less time, money, and effort. If the test to which we subject our product is 
destructive, sampling is the only way of ensuring that there will remain a product 
to sell. In short, the considerations that favor sampling are applicable within the 
context of quality control. Furthermore, inspecting 100% of the product does not 
mean that we’ll recognize all unacceptable items. When inspectors have to ex¬ 
amine every item produced, fatigue and monotony surely impair their efficiency. 

Assume that the items of product are available in groups, called inspection lots. 
The sampling procedure, called lot-by-lot sampling inspection, may proceed. In¬ 
spection of an item results in its being classified as either acceptable (nondefective) 
or not acceptable (defective). In other words, an item of product possesses one 
or the other attribute: defective or nondefective. Therefore we call the procedure 
attribute sampling. When we carry out a sampling procedure, we draw the sampled 
items at random from the inspection lots, without replacement. Depending on the 
results, we either accept or reject the lots. 





The actual sampling procedure is called a sampling plan. The three standard 
plans are as follows. 

Single Sampling Plan As the name implies, a single sampling plan requires the 
drawing of only one sample. The following is an example. 

EXAMPLE 16.4.1 Inspection lots consist of 1000 items of product. An inspector 
selects from a lot a sample of size 80, and examines each. When she finds 7 or 
fewer defective items out of the 80 in the sample, she considers the lot acceptable, 
and accepts it. When she finds 8 or more defective, she considers the lot unac¬ 
ceptable, and rejects it. 

Double Sampling Plan A double sampling plan allows an inspector to draw two 
samples, if needed, before reaching a decision to accept or reject a lot. The 
following is an example. 

EXAMPLE 16.4.2 We examine a sample of 50 items. When we find 3 or fewer 
defective items, we accept the lot. When we find 7 or more defective items, we 
reject the lot. When we find more than 3 but fewer than 7 defective items, we 
neither accept nor reject. Instead we select a second sample of 50 items from the 
same lot, and examine these items. If the number of defectives in the second 
sample plus those in the first is equal to or less than 6, we accept the lot. If the 
number of defectives in the second sample plus those in the first is equal to or 
greater than 7, we reject the lot. 

Multiple Sampling Plan Multiple sampling plans allow for the drawing of more 
than two samples, if the inspector needs to do so in order to reach a decision to 
reject or accept a lot. The following is an example. 

EXAMPLE 16.4.3 We select a sample of size 20 from an inspection lot. When 
examination reveals no defective items, we accept the lot. When we find 4 or 
more defectives, we reject the lot. When we find 1, 2, or 3 defective items, we 
select another sample of size 20. When examination of both samples (40 items) 
turns up only 1 defective, we accept the lot. When we find 6 or more defectives, 
we reject the lot. When we find 2, 3, 4, or 5 defectives, we take another sample 
of size 20. This process continues until we decide to reject or accept the lot. We 
reach a decision at least by the time we examine the seventh sample of size 20. 
The flow diagram in Figure 16.4.1 shows the number of defectives required for 
a decision at each stage of the process. 

These sampling plans are from a compilation of sampling plans and procedures 
for inspection by attributes prepared by the Department of Defense. (The title is 
Military Standard Sampling Procedures and Tables for Inspection by Attributes , 
MIL-STD-105D, 29 April 1963.) It is for sale by the Superintendent of Docu¬ 
ments, U.S. Government Printing Office, Washington, D.C. 20402. 
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Here we shall consider single and double sampling plans only. We shall not 
discuss multiple sampling, though that is important. It is convenient to adopt the 
following notation: 


N = 
n = 
n ] = 
n 2 = 
d = 
d } = 
d 2 = 
c = 
c + 1 = 

C\ = 
= 

c 2 + 1 = 
p' = 
Pa = 


the size of the lot 

the size of the sample, single sampling 

the size of the first sample, double sampling 

the size of the second sample, double sampling 

the number of defective items in sample, single sampling 

the number of defective items in first sample, double sampling 

the number of defective items in second sample, double sampling 

acceptance number, single sampling 

rejection number, single sampling 

acceptance number on first sample, double sampling 

acceptance number on second sample, double sampling 

rejection number for second sample, double sampling 

lot fraction defective (sometimes expressed as percent) 

probability of acceptance of a lot 


We write the criteria for acceptance and rejection in the single and double sampling 
plans in condensed form as follows: 


Single sampling: d defective items in n : 

^ < c, accept lot, d > c, reject lot 
Double sampling: d x defective items in n x : 

d x < c u accept lot, d x > c 2 , reject lot, 
c x < d x < c 2 , take second sample of size n 2 

Double sampling: d { + d 2 defective items in n x 4- n 2 : 

d x + d 2 < c 2 , accept lot, d x + d 2 > c 2 , reject lot 


In order to use MIL-STD-105D, one must be able to specify the lot size, the 
acceptable quality level (AQL), the type of sampling plan (single, double, or 
multiple), and the inspection level desired. The AQL is the maximum percent 
defective that we can consider satisfactory as a process average (average percent 
defective in the production process), or lot percent defective. The purpose of 
sampling inspection is to judge whether or not this AQL is, in fact, exceeded. 

Three levels of inspection are available: level /, for use when we need less 
discrimination; level //, for normal inspection; and level 111 , for use when we need 
more discrimination. We use level I when the supplier of the product has a rep¬ 
utation for providing a high-quality product. We use level III when the product 
is from a supplier with a reputation for submitting a poor-quality product. We use 
level II for the cases in between. 



To illustrate the use of MIL-STD-105D, consider the following example. 

EXAMPLE 16.4.4 Company A buys parts from Company B. The parts are received 
in lots of size N = 1000, and inspected at Company A’s receiving inspection 
station. The company uses a single sampling plan selected from MIL-STD-105D, 
with an AQL of 4% for a normal inspection level (level II). The MIL-STD-105D 
for the present example consists of the following steps. 

1. Enter Table 16.4.1 (Table I of MIL-STD-105D) with the lot size of 1000 and 
general inspection level II, to obtain a sample size code letter. For lot sizes from 
501 to 1200 and general inspection level II, the sample size code letter is J. 

2. Enter Table 16.4.2 (Table II-A of MIL-STD-105D) with the sample size code 
letter and the AQL, to determine the sample size and acceptance and rejection 
numbers. 

Step 1 revealed the sample size code letter to be J. Table 16.4.2 tells us that 
the sample size should be 80. Our AQL was designated as 4. Locating in Table 
16.4.2 the intersection of the column headed 4.0 and the row labeled J, we find 
that the acceptance and rejection numbers are 7 and 8, respectively. This tells us 
to accept a lot if the random sample of size 80 yields 7 or fewer defective items, 
and to reject it if it yields 8 or more defective items. 

Other standard sampling plans are available. If you are interested in acceptance 
sampling, two volumes that are important are the book of sampling inspection 
tables by Dodge and Romig (1959), and the book of sampling inspection by 
Freeman et al. (1948). 


TABLE 16.4.1 
Sample size code 
letters. Table I of 
MII-STD-105D 


Lot or batch size 

Special inspection levels 

General inspection levels 

S-1 

S-2 

S-3 

S-4 

1 

II 

III 

2 

to 

8 

A 

A 

A 

A 

A 

A 

B 

9 

to 

15 

A 

A 

A 

A 

A 

B 

C 

16 

to 

25 

A 

A 

B 

B 

B 

C 

D 

26 

to 

50 

A 

B 

B 

C 

C 

D 

E 

51 

to 

90 

B 

B 

C 

C 

C 

E 

F 

91 

to 

150 

B 

B 

C 

D 

D 

F 

G 

151 

to 

280 

B 

C 

D 

E 

E 

G 

H 

281 

to 

500 

B 

c 

D 

E 

F 

H 

J 

501 

to 

1,200 

C 

c 

E 

F 

G 

J 

K 

1,201 

to 

3,200 

c 

D 

E 

G 

H 

K 

L 

3,201 

to 

10,000 

c 

D 

F 

G 

J 

L 

M 

10,001 

to 

35,000 

c 

D 

F 

H 

K 

M 

N 

35,001 

to 

150,000 

D 

E 

G 

J 

L 

N 

P 

150,001 

to 

500,000 

D 

E 

G 

J 

M 

P 

Q 

500,001 

and 

over 

D 

E 

H 

K 

N 

_i_ 

Q 

R 


Source: MIL-STD-105D, Sampling Procedures and Tables for Inspection by Attributes, Department of 
Defense, 29 April 1963 




TABLE 16.4.2 
Single sampling 
plans for normal 
inspection (master 
table). Table ll-A 
of MIL-STD-105D 


Acceptable Quality Levels 


size 

code 

letter 

Sample 

size 

0.010 

0.015 

0.025 

0.040 

0.065 

0.10 

0.15 

0.25 

0.40 

0.65 

1.0 

1.5 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

Ac Re 

A 

2 

























B 

3 

























C 

5 

























D 

8 























0 

1 

E 

13 





















0 

1 

4 

F 

20 



















0 

1 

1 

4 

G 

32 

















0 

1 

4 

4 

i 

2 

H 

50 















0 

1 



4 

i 

2 

2 

3 

J 

80 













0 

i 

4 

_t._ 

1 

2 

2 

3 

3 

4 

K 

125 











0 

1 



4 

1 

2 

2 

3 

3 

4 

5 

6 

L 

200 









0 

1 

4 

1 


1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

M 

315 







0 

1 

1 

4 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

10 

11 

N 

500 

I 




0 

1 

4 

4 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

10 

11 

14 

15 

P 

800 



0 

1 


t 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

10 

11 

14 

15 

21 

22 

Q 

1250 

0 

1 



t 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

10 

11 

14 

15 

21 

22 



R 

2000 

_i 

j 

[ 



1 

2 

2 

3 

3 

4 

5 

6 

_ 

r 

8 

10 

11 

14 

15 

21 

22 






t = Use first sampling plan below arrow. If sample size equals, or exceeds, lot or batch size, do 100 percent 
inspection. 

4 = Use first sampling plan above arrow. 

Ac = Acceptance number 
Re = Rejection number 


Operating Characteristic (OC) Curves We can summarize any sampling plan 
graphically by means of an operating characteristic (OC) curve. The OC curve 
indicates the probability P a that a lot of submitted product is accepted if it is of 
a given quality as indicated by p, the fraction of defective items in the lot. Stated 
another way, the operating characteristic curve of a sampling plan indicates the 
percentage of lots that may be expected to be accepted if they are of a given 
quality. MIL-STD-105D gives OC curves for various combinations of p, P a , AQL, 
and sample size. 

EXAMPLE 16.4.5 Find the OC curve for the single sampling plan described earlier. 

To locate the curve, recall that the code letter for the sample size is J. Accord¬ 
ingly, we go to Chart J in MIL-STD-105D (see Figure 16.4.2). Since the AQL 
is 4, we locate the curve labeled 4 in Chart J of Figure 16.4.2. On the horizontal 
axis, we can find various values of p (percent defective, in our case) for a sub¬ 
mitted lot. Following the line up to the curve and reading from this point on the 
curve across to the vertical axis, we read the probability (P a ) of the lot’s being 
accepted. Suppose that 6% of the items are defective. We locate the figure 6 on 
the horizontal axis, move up the vertical line drawn at 6 until we reach the curve 
labeled 4.0, and read across to the vertical axis. There we find that P a is equal to 
0.90. This means that the probability of the lot’s being accepted is 0.90. In other 
words, if lots are submitted with a consistent 6% defective, we accept 90% of 
them in the long run. 




Source: MIL-STD-105D, Sampling Procedures and Tables for Inspection by Attributes, Department of Defense, 29 April 
1963. 


Alternatively, we can get the same information for selected values of P a from 
Table X-J-l (Figure 16.4.2). This table shows that 90% of lots with 5.91 defec¬ 
tives are accepted. We can attribute this slight difference from the value read from 
the chart to the loss in precision that is usual in graphs of mathematical data. 

In addition to the OC curve, we can construct other curves to describe a sam¬ 
pling plan. These are listed below with a brief description. [For a more detailed 
discussion of these curves, refer to the book by Burr (1953).] 

1. The average outgoing quality (AOQ) curve. This curve makes it easier to 
determine the quality in all lots after we have examined the rejected lots 100% 
and removed all defectives. 

2. The averge sample number (ASN) curve. This curve shows the average number 
of pieces per lot that we must inspect before we reach a decision to reject or 
accept the lot. The ASN depends on the quality of the incoming lot. This curve 
is useful when double or multiple sampling plans are used. 

3. The average total inspection (ATI) curve. If we want to know the average toted 
amount of inspection per lot, including the sampling inspection and any 100% 
sorting required, then we would find an average total inspection curve of value. 


Exercises 


16.4.1 Find a single sampling plan for normal inspection to fit the following situations!: 
(a) Inspection lots of size 600 and an AQL of 0.15. (b) Inspection lots of size 800 and 
an AQL of 2.5. (c) Inspection lots of size 900 and an AQL of 10. (d) Inspection lots of 


















FIGURE 16.4.2 
Tables for sample- 
size code letter J, 
Table X-J of MIL- 
STD-105D (Source: 
MIL-STD-105D, 
Sampling 
Procedures and 
Tables for 
Inspection by 
Attributes, 
Department of . 
Defense, 29 April 
1963) 


size 1000 and an AQL of 1.5. (e) Inspection lots of size 1200 and an AQL of 4. 

16.4.2 For each of the plans in Exercise 16.4.1, find the quality of the process for which 
we may expect 95% of the lots to be accepted. 


16.5 ACCEPTANCE SAMPLING BY VARIABLES 

Often the characteristic we want to study in quality control is measurable on a 
continuous scale. It is reasonable to assume that the characteristic follows, at least 
approximately, some specific distribution, say the normal. When this is the case, 
we may want to use a sampling plan based on measurements such as the sample 
mean, or the sample mean and standard deviation. Sampling plans of this type 
are called variables sampling plans. Their use in acceptance sampling is called 
acceptance sampling by variables. 



















We shall discuss two types of variables sampling plans: (1) Known sigma plans, 
used when the variable of interest has a specified distribution and the standard 
deviation is known. (2) Unknown sigma plans, used when the variable of interest 
has a specified distribution and the standard deviation is unknown. In each case 
we’ll develop a plan that indicates how large a sample to inspect and give some 
criterion against which to compare the results in order to determine whether to 
reject the lot. 

Known Sigma 
Plans 


1. A plan that protects us from too small a lot mean. In this type, we specify a 
minimum value of X. That is, we determine for the lot mean an acceptable value 
located enough standard deviations above the specification to ensure that very few 
values of X are below the specification. Likewise, we choose a rejectable value 
for the lot mean so close (as measured in standard deviations) to the minimum 
specification that we are certain to reject lots with a mean this low. 

2. A plan that protects us from too large a lot mean. The procedure is similar to 
the first plan, except that we specify a maximum value of X. 

3. A plan that provides protection against a lot mean that deviates too far in either 
direction. This type of plan is a logical extension of the first two. 

The following example illustrates the case in which we are interested in pro¬ 
tection against a too-small lot mean. 

EXAMPLE 16.5.1 Suppose that we know that the tensile strength of a certain type 
of wire is normally distributed. We estimate the true variance from extensive past 
data. For practical purposes, we consider that we have a known sigma, which we 
call cr', of V30 = 5.48. A roll of wire with a tensile strength less than 87 is not 
strong enough to do the job for which it is intended. We would like 1% or fewer 
of our rolls of wire to be below 87 in tensile strength. In other words, if no more 
than 1% of the items m a lot are defective, we will accept the lot. What will the 
value of the lot mean X' have to be in order for us to achieve this state of affairs 
in the long run? 

Appendix Table C shows that under the standard normal curve, the value of z 
to the left of which lies 0.01 of the area is -2.33. We can use the relationship 

X - X’ 

z = ~v~ 

to determine the needed value of X'. When we substitute known quantities into 
this expression, we have 


* 

Assume that the variable of interest is normally distributed and that the standard 
deviation cr is known and constant from lot to lot. This is not an unrealistic 
assumption for many manufacturing processes. By a known a , we mean an es¬ 
timate based on a large amount of previous data, since, in general, the true cr 
cannot be known. 

The following three types of plans are available: 





-2.33 


87 - r 
5.48 


which yields X' = 99.7684. We designate_this our acceptablejpiality and call 
it X' 2 . Let us now find a rejectable value of X', which we’ll call X [. We begin by 
specifying some proportion of X values for which, if this proportion lies below 
87 in a given lot, we would want to be sure of rejecting the lot. Assume that we 
would want to reject lots with_5% defective items. The desired proportion is 0.05. 
We now find the rejectable (X[) value by noting that Table C indicates that the z 
value to the left of which 0.05 of the area under the standard normal curve lies 
is - 1.645. This leads to 


-1.645 


87 - x; 
5.48 


andXj = 96.0146, the rejectable value ofX'. We must now answer two questions: 
(1) What risk do we wish to run of accepting a lot with a mean tensile strength 
of 96.0146, if a lot of this quality should be offered? (2) What risk do we want 
to take of rejecting a lot with a mean tensile strength of 99.7684? 

The risk of rejecting a lot of acceptable quality is designated a. The risk of 
accepting a lot of rejectable quality is designated /3. For the present example, a 
is specified as 0.01, and fi as 0.10. We want a plan that meets these requirements 
and, at the same time, tells us how large a sample to take and the minimum 
sample mean X for which we will accept a lot. This minimum value of the sample 
mean is called K. Suppose that the sample comes from a population with a mean 
X' of 96.0146 and & = 5.48. We know that the sampling distribution of the 
means from all possible samples is normal, with a mean of 96.0146 and a standard 
deviation equal to 


&_ = 5.48 
\fti Vw 


Similarly, suppose that the sample comes from a population with a mean of 
99.7684 and a standard deviation of 5.48. We know that the sampling distribution 
of means computed from samples from this population is normal, with a mean 
of 99.7684 and a standard deviation of 5.48/Vft. Figure 16.5.1 shows these two 
distributions, as well as a and /3. 

Considering, in turn, each of the two sampling distributions shown in Figure 
16.5.1, we can make the following observations. We can convert the sampling 
distribution on the left to the standard normal distribution using the relationship 


K - X { 

& /Xfn 


( 16 . 5 . 1 ) 


Table C shows that the value of z to the right of which 0.10 of the area under the 
standard normal curve lies is +1.28. Substituting this and other known quantities 
into Equation 16.5.1 yields 


FIGURE 16.5.1 
Sampling 
distributions of x 
from populations 
of acceptable and 
rejectable quality 
with corresponding 
risks, tensile 
strength of wire 



+ 1.28 = 


96.0146 


5.48/Vn 


If we proceed in the same manner with the sampling distribution shown on the 
right, we find that the appropriate z value is —2.33. This leads to 

- 2.33 = ^7684 
5.48 /Vn 


Solving these two equations simultaneously for n gives n — 27.77. 

To find K, we can either substitute 27.77 into the first equation, which was 
derived from the distribution with which we associated /3, or we can substitute it 
into the equation arising from the distribution with which we associated a. Both 
are illustrated here. Of course, in actual practice, we would select one or the other 
only. Substituting into the first equation, we have 


1.28 


K - 96.0146 
5.48/V27.77 


1.3311 = K - 96.0146, K = 97.3457 


Substituting into the second equation, we have 


2.33 = 


2.4230 


K - 99.7684 
5.48 /V27.77 
K - 99.7684, 


K = 97.3454 (note rounding error) 


We always round up the actual sample size used, in this case to 28. In solving 
for K , we always use the calculated sample size, not the rounded value, to obtain 
the correct K value. The sampling plan now reads as follows: “Select 28 speci¬ 
mens at random and test each for tensile strength. If the mean tensile strength is 
greater than or equal to 97.3457, accept the lot. If the mean tensile strength is 
less than 97.3457, reject the lot.” 





Unknown Sigma 
Plans 


Standard Variables 
Sampling Plans 


The procedure for obtaining a sampling plan when we are interested in protec¬ 
tion against too large a lot mean is, as we have noted, analogous to this procedure. 
We can outline it briefly as follows. 

1. Determine the specification value. 

2. Choose an acceptable value for a lot mean; call it X\. The same considerations 
as were noted above should go into its selection. 

3. Choose a rejectable value for a lot mean; call it X' 2 . Note that the notation is 
reversed from what it_was when we were interested in protection against a too- 
small mean. Whereas X\ previously referred to rejectable quality_and X’ 2 referred 
to acceptable quality, X[ now refers to acceptable quality and X 2 to rejectable 
quality. The purpose of this notation is to retain the relationship X 2 > X{. 

4. Choose values for a, the risk of rejecting lots of acceptable quality, and /3, 
the risk of accepting lots of rejectable quality. 

5. Solve for n and K as before. 

6. _The plan specifies n, and indicates acceptance of the lot if X < K and rejection 
if X > K. (For an example of determining this type of plan, see Exercise 16.5.2.) 

In some situations, we want to reject lots with either too large a mean or too 
small a mean. In other words, there may be both an upper and a lower specifi¬ 
cation. For a discussion of plans of this type, which are called double-specification 
plans, see the texts suggested in the introduction to this chapter. 

The value of n initially specified by a sampling plan may be larger than desired 
from a practical or economical point of view. If this occurs, we must increase 
either a or /3 or both. Alternatively, we must increase the distance between ac¬ 
ceptable and rejectable means in the single-specification case. Or we must increase 
the distance between the two rejectable means in the double-specification plans. 

Some people may wish to construct OC curves for plans. [For coverage of this 
topic, see the books by Duncan (1974) and Cowden (1957).] 

We may not always know the lot standard deviation, as we did in the situations 
just discussed. When the standard deviation is unknown, we proceed in essentially 
the same manner as for a known sigma plan, except that we use some sample 
estimate in place of cr'. A larger sample size is the price we pay for using an 
estimate of &. [Cowden (1957), Duncan (1979), and Grant and Leavenworth 
(1980) give the details of finding sampling plans when the lot standard deviation 
is unknown.] 

A number of standard sampling plans for variables sampling are available in 
published form. You may select from among these plans the particular one that 
best fits your needs and situation. Among the available published plans are those 
given by Bowker and Goode (1952) and Lieberman and Resnikoff (1955), and 
those contained in the Military Standard 414 (MIL-STD-414) (See United States 
Department of Defense, 1957). 



MIL-STD-414 has four sections. Section A gives a general description of the 
plans. Section B contains variables plans based on the sample standard deviation 
for the case in which cr' is unknown. Section C contains plans based on the sample 
range. Section D gives plans based on the sample mean for the case in which cr' 
is known. The use of MIL-STD-414 lets us (1) calculate a maximum allowable 
percent of defective items in a given lot and (2) estimate the percent of defective 
items in the presented lot. If the latter exceeds the former, we reject the lot. 
Otherwise we accept it. 

The use of MIL-STD-414 is illustrated by the following example, in which 
there is a double-specification limit, the variability is unknown, and the sample 
standard deviation is used as the basis for determining the sampling plan. There 
are five inspection levels available. For this example, we shall use level IV, normal 
inspection. 

EXAMPLE 16.5.2 The specifications on the diameter of a certain type of wire cable 
call for an upper limit of 0.363 and a lower limit of 0.357, with a nominal diameter 
of 0.360. The AQL is 2% for both upper and lower specification limits combined. 
We want a sampling plan for incoming lots of size 800. As already indicated, we 
shall assume that the variability is unknown, that we wish to use the normal (IV) 
inspection level, and that we wish to use a plan that uses the standard-deviation 
method rather than the range method. 

The first step is to consult Table A-l of MIL-STD-414 (see Table 16.5.1) to 
see what AQL value to use. We have specified an AQL value of 2. Since this 
falls between 1.65 and 2.79, it should be replaced (according to Table A-l) by 
2.5 for future use in obtaining the sampling plan. 

We next consult Table A-2 of MIL-STD-414 (see Table 16.5.2) to determine 
the code letter for our sample size. The table indicates that when lots of size 800 
are presented and inspection level IV is used, the sample size code letter is J. 


TABLE 16.5.1 

For specified AQL values 


AQL conversion 

tahlo Tahlo A.1 

falling within these ranges 

Use this AQL value 


lauic, iauic «“i 

of MIL-STD-414 

— 

to 

0.049 

0.04 


0.050 to 

0.069 

0.065 


0.070 to 

0.109 

0.10 


0.110 

to 

0.164 

0.15 


0.165 

to 

0.279 

0.25 


0.280 

to 

0.439 

0.40 


0.440 

to 

0.699 

0.65 


0.700 

to 

1.09 

1.0 


1.10 

to 

1.64 

1.5 


1.65 

to 

2.79 

2.5 


2.80 

to 

4.39 

4.0 


4.40 

to 

6.99 

6.5 


7.00 

to 

10.9 

10.0 


11.00 

to 

16.4 

15.0 


Source: MIL-STD-414, 11 June 1957 



TABLE 16.5.2 
Sample size code 
letters, 1 Table A-2 
of MIL-STD-414 


Lot size 




Inspection levels 



1 

II 

III 

IV 

V 

3 

to 

8 

B 

B 

B 

B 

c 

9 

to 

15 

B 

B 

B 

B 

D 

16 

to 

25 

B 

B 

B 

C 

E 

26 

to 

40 

B 

B 

B 

D 

F 

41 

to 

65 

B 

B 

C 

E 

G 

66 

to 

110 

B 

B 

D 

F 

H 

111 

to 

180 

B 

C 

E 

G 

1 

181 

to 

300 

B 

D 

F 

H 

J 

301 

to 

500 

C 

E 

G 

1 

K 

501 

to 

800 

D 

F 

H 

J 

L 

801 

to 

1,300 

E 

G 

1 

K 

L 

1,301 

to 

3,200 

F 

H 

J 

L 

M 

3,201 

to 

8,000 

G 

1 

L 

M 

N 

8,001 

to 

22,000 

H 

J 

M 

N 

O 

22,001 

to 

110,000 

1 

K 

N 

O 

P 

110,001 

to 

550,000 

I 

K 

0 

P 

Q 

550,001 

and 

over 

1 

K 

P 

Q 

Q 


1 Sample size code letters given in body of table are applicable when the indicated inspection 
levels are to be used. 

Source: MIL-STD-414, 11 June 1957 


We now consult Table B-3 of MIL-STD-414 (see Table 16.5.3). It shows that 
for code letter J and an AQL of 2.5, the sample size is 30, and the value of M, 
the maximum allowable percent of defective items, is 5.86%. 

We now know that we must take from each lot a sample of size 30, compute 
an estimate of the percent of defectives items in the lot, and compare this value 
with 5.86. If the estimate of the percent of defective items in the lot is less than 
or equal to 5.86, we accept the lot. Otherwise we reject it. 

Suppose that a lot is presented and a sample of size 30 yields the diameter 
measurements in Table 16.5.4. From the sample data of Table 16.5.4, we compute 
x = 0.362 and s- = 0.0006. Next, we calculate the upper and lower quality 
indices Q v and Q L as follows: 



and 



(16.5.3) 


Here U = the upper specification limit and L = the lower specification limit. 
For the present example, we have 


Qu 


0.363 - 0.362 


1.67, 


Ql 


0.362 - 0.357 


8.33 


0.0006 


0.0006 



Acceptable quality levels (normal inspection) 


Sample size 
code letter 

Sample 

size 

.04 

.065 

.10 

.15 

.25 

.40 

.65 

1.00 

1.50 

2.50 

4.00 

6.50 

10.00 

15.00 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

B 

3 
















f 

7.59 

18.86 

26.94 

33.69 

40.47 

C 

4 















1.53 

5.50 

10.92 

16.45 

22.86 

29.45 

36.90 

D 

5 













1.33 

3.32 

5.83 

9.80 

14.39 

20.19 

26.56 

33.99 

E 

7 









0.422 

1.06 

2.14 

3.55 

5.35 

8.40 

12.20 

17.35 

23.29 

30.50 

F 

10 







0.349 

0.716 

1.30 

2.17 

3.26 

4.77 

7.29 

10.54 

15.17 

20.74 

27.57 

G 

15 

0.099 

0.186 

0.312 

0.503 

0.818 

1.31 

2.11 

3.05 

4.31 

6.56 

9.46 

13.71 

18.94 

25.131 

H 

20 

0.135 

0.228 

0.365 

0.544 

0.846 

1.29 

2.05 

2.95 

4.09 

6.17 

8.92 

12.99 

18.03 

24.53 

1 

25 

0.155 

0.250 

0.380 

0.551 

0.877 

1.29 

2.00 

2.86 

3.97 

5.97 

8.63 

12.57 

17.51 

23.97 

J 

30 

0.179 

0.280 

0.413 

0.581 

0.879 

1.29 

1.98 

2.83 

3.91 

5.86 

8.47 

12.36 

17.24 

23.58 

K 

35 

0.170 

0.264 

0.388 

0.535 

0.847 

1.23 

1.87 

2.68 

3.70 

5.57 

8.10 

11.87 

16.65 

22.91 

L 

40 

0.179 

0.275 

0.401 

0.566 

0.873 

1.26 

1.88 

2.71 

3.72 

5.58 

8.09 

11.85 

16.61 

22.36 

M 

50 

0.163 

0.250 

0.363 

0.503 

0.789 

1.17 

1.71 

2.49 

3.45 

5.20 

7.61 

11.23 

15.87 

22.00 

N 

75 

0.147 

0.228 

0.330 

0.467 

0.720 

1.07 

1.60 

2.29 

3.20 

4.87 

7.15 

10.63 

15.13 

21.11 

0 

100 

0.145 

0.220 

0.317 

0.447 

0.689 

1.02 

1.53 

2.20 

3.07 

4.69 

6.91 

10.32 

14.75 

20.66 

P 

150 

0.134 

0.203 

0.293 

0.413 

0.638 

0.949 

1.43 

2.05 

2.89 

4.43 

6.57 

9.88 

14.20 

20.02 

Q 

200 

0.135 

0.204 

0.294 

0.414 

0.637 

0.945 

1.42 

2.04 

2.87 

4.40 

6.53 

9.81 

14.12 

19.92 


.065 

.10 

.15 


.40 

.65 

1.00 

1.50 

2.50 

4.00 

6.50 

10.00 

15.00 



Acceptable quality levels (tightened inspection) 


All AQL and table values are in percent defective. 

I Use first sampling plan below arrow, that is, both sample size as well as M value. When sample size equals or exceeds lot size, every item in the lot must be 
t inspected. 


Source: MIL-STD-414, 11 June 1957 


TABLE 16.5.3 
Master table for 
normal and 
tightened 
inspection for 
plans based 
on variability 
unknown (double 
specification limit 
and form 2—-single 
specification limit), 
standard deviation 
method, Table B-3 
of MIL-STD-414 


The next step is to enter Table B-5 of MIL-STD-414 (see Table 16.5.5) with 
these values to find the estimated percent of defective items in the lot for Q v and 
Q l . For Q v , we find the estimated percent of defective items in the lot above U 
by locating the intersection of the row for 1.67 and the column for sample size 
30. The value we seek is 4.48%. In a like manner, we find that the estimated 
percent of defective items in the lot below L is 0. We now find the total estimated 
percent of defective items in the lot by adding 4.48% and 0% to get 4.48%. We 
compare this with 5.86%, the previously found maximum allowable percent of 
defective items in the lot. Since 4.48 < 5.86, we accept the lot. 

Table A-3 of MIL-STD-414 gives the OC curve for our plan (see Figure 16.5.2). 


TABLE 16.5.4 
Diameter 
measurements 
obtained in 
sample of size 30, 
Example 16.5.1 


0.362 

0.362 

0.363 

0.361 

0.363 

0.363 

0.362 

0.361 

0.362 

0.362 


0.362 

0.362 

0.362 

0.362 

0.363 

0.362 

0.362 

0.362 

0.362 

0.361 


0.361 

0.363 

0.362 

0.361 

0.362 

0.362 

0.362 

0.361 

0.362 

0.362 



TABLE 16.5.5 
Table for 
estimating the lot 
percent defective 
using standard- 
deviation method 1 


Qu or Ql 


Sample size 


10 

20 

30 

0 

50.00 

50.00 

50.00 

0.1 

46.16 

46.08 

46.05 

0.2 

42.35 

42.19 

42.15 

0.3 

38.60 

38.37 

38.31 

0.40 

34.93 

34.65 

34.58 

0.50 

31.37 

31.06 

30.98 

0.60 

27.94 

27.63 

27.55 

0.70 

24.67 

24.38 

24.31 

0.80 

21.57 

21.33 

21.27 

0.90 

18.67 

18.50 

18.46 

1.00 

15.97 

15.89 

15.88 

1.10 

13.50 

13.52 

13.53 

1.20 

11.24 

11.38 

11.42 

1.30 

9.22 

9.48 

9.55 

' 1.40 

7.44 

7.80 

7.90 

1.50 

5.87 

6.34 

6.46 

1.60 

4.54 

5.09 

5.23 

1.67 

3.73 

4.32 

4.48 

1.70 

3.41 

4.02 

4.18 

1.80 

2.49 

3.13 

3.30 

1.85 

2.09 

2.75 

2.92 

1.90 

1.75 

2.40 

2.57 

1.95 

1.44 

2.09 

2.26 

2.00 

1.17 

1.81 

1.98 

2.10 

0.74 

1.34 

1.50 

2.20 

0.437 

0.968 

1.120 

2.23 

0.366 

0.875 

1.023 

2.25 

0.324 

0.816 

0.962 

2.30 

0.233 

0.685 

0.823 

2.40 

0.109 

0.473 

0.594 

2.50 

0.041 

0.317 

0.421 

2.60 

0.011 

0.207 

0.293 

2.70 

0.001 

0.130 

0.200 

2.80 

0.000 

0.079 

0.133 

2.90 

0.000 

0.046 

0.087 

3.00 

0.000 

0.025 

0.055 

3.10 

0.000 

0.013 

0.034 

3.20 

0.000 

0.006 

0.020 

3.30 

0.000 

0.003 

0.012 

3.40 

0.000 

0.001 

0.007 

3.50 

0.000 

0.000 

0.003 

3.60 

0.000 

0.000 

0.002 

3.70 

0.000 

0.000 

0.001 

3.80 

0.000 

0.000 

0.000 

3.90 

0.000 

0.000 

0.000 


Values tabulated are read in percent. 

Source: Abridged from Table B-5 of MIL-STD-414, 11 June 1957. 


FIGURE 16.5.2 
Operating 
characteristic 
curves for sampling 
plans based on 
standard-deviation 
method. Table A-3 
of MIL-STD-414 
( Source ; MIL-STD- 
414, Sampling 
Procedures and 
Tables for 
Inspection by 
Attributes , 
Department of 
Defense, 11 June 
1957) 



















Exercises 


Summary 


Review Questions 


16.5.1 The amounts of pressure required to rupture a certain type of fuel tank, in pounds 
per square inch, are normally distributed. Given past data, we accept a value of cr' 1 2 3 4 5 6 7 8 9 10 11 — 
9200 as the population variance. Any tank that ruptures when subjected to 2500 psi of 
pressure or less is unacceptable. We would like 1 % or fewer unacceptable tanks in accepted 
lots. If more than 5% of the tanks in a lot are defective, we want to reject the lot. Using 
the procedure of Example 16.5.1, find a single sampling plan. Let a = 0.01, /3 = 0.10. 

16.5.2 A manufacturer of frozen meat pies requires that the ingredients have a moisture 
content not exceeding 20%. The standard deviation among packages is known to be 0.05. 
One percent or fewer packages per lot with a moisture content greater than 20% is ac¬ 
ceptable. However, if lots in which 2.5% or more packages have a moisture content in 
excess of 20% are presented, we want them to be rejected. Using the procedure of Example 
16.5.1, find a suitable sampling plan. Let a be 0.025, and /3 be 0.05. What assumptions 
are necessary? Do they seem reasonable? If the preliminary sample size is intolerable, 
what would you suggest doing? 

16.5.3 A certain length of metal chain is considered to be unacceptable if readings above 
381 or below 327.5 are recorded when it is subjected to a stress test. Readings are known 
to be normally distributed with an unknown standard deviation. Assume an AQL of 1.5% 
for both upper and lower specification limits combined, and lots of size 600. Use MIL- 
STD-414 to find a single sampling plan, using the normal inspection level and the standard- 
deviation method. The following sample measurements are observed. Indicate whether or 
not the lot should be accepted. 


364 

382 

353 

370 

353 

373 

359 

346 

350 

366 

327 

375 

366 

349 

340 

372 

336 

356 

356 

364 

367 

342 

342 

358 

345 

376 

346 

355 

361 

353 


This chapter introduced the subject of quality control as a logical and meaningful 
area in which to apply the concepts and techniques of statistical inference. We 
limited treatment of the subject to the concepts of acceptance sampling and control 
charts. We covered the use of these techniques for both variables and attributes. 
We also introduced the concept of the operating characteristic curve. 

1. Define: (a) inspection lot, (b) attribute sampling, (c) sampling plan. 

2. Explain, by means of an example, each of the following: (a) single sampling plan, 
(b) double sampling plan, (c) multiple sampling plan. 

3. What is MIL-STD-105D? 

4. What is an operating characteristic curve? How is it used? 

5. What is a variables sampling plan? 

6. Explain the difference between know and unknown sigma plans. 

7. What is MIL-STD-414? 

8. What is a control chart? 

9. What is meant by the term “assignable cause”? 

10. What is an x chart? 

11. What is an R chart? 




12. Why is the range used in constructing control charts for detecting a shift in population 
dispersion? 

13. Briefly outline the steps involved in constructing and using variables control charts. 

14. What is a p chart? 

15. List the basic facts about the sampling distribution of p. 

16. A manufacturer keeps control charts on the weights of 24-ounce packages of a break¬ 
fast cereal filled by an automatic packaging machine. The sample size is 5. Values of x 
and R are computed for each sample. After 25 samples, = 602.5 ounces and 2/? = 
17.5 ounces. Assume that the process is in control. Compute 3 sigma control limits for 
x and R charts. 

17. A p chart is to be used to monitor the fraction of defective items in the manufacture 
of a certain valve. Random samples of size 100 are selected from each day’s production 
and inspected. At the end of 25 days, the following results are obtained, (a) Construct a 
preliminary control chart for p. (b) Plot the 25 p values on the preliminary control chart, 
(c) Are any values outside the control limits? What suggestions would you make if the p 
chart is to be used to monitor the process for the fraction of defective valves? 


Day 

Number of 
defectives 

Day 

Number of 
defectives 

Day 

Number of 
defectives 

1 

9 

9 

22 

17 

11 

2 

12 

10 

12 

18 

6 

3 

10 

11 

10 

19 

14 

4 

13 

12 

14 

20 

17 

5 

11 

13 

11 

21 

11 

6 

12 

14 

10 

22 

9 

7 

10 

15 

16 

23 

25 

8 

14 

16 

12 

24 

12 





25 

10 


18. In acceptance sampling for attributes, select a single sampling plan, using normal 
inspection for each of the following situations: (a) Inspection lots of 100 and an AQL of 
4.0. (b) Inspection lots of 400 and an AQL of 0.25. (c) Inspection lots of 5000 and an 
AQL of 0.15. (d) Inspection lots of 700 and an AQL of 10. (e) Inspection lots of 1500 
and an AQL of 0.65. 

19. The manager of a soft-drink bottling plant wants to monitor and warrant the average 
amount of cola put in 16-ounce bottles that are filled in the bottling operation. She plans 
to select a random sample of the bottles filled each day, measure the volume of cola in 
each, and compute the mean amount per bottle. If the mean volume for all bottles filled 
during a day is equal to or greater than 16.04 fluid ounces, she wants to be 95% sure of 
classifying the day’s operation as acceptable. If the mean is less than 16 fluid ounces, she 
wants to be 99% sure of classifying the day’s operation as unacceptable. The standard 
deviation is 0.14 fluid ounce. Find a suitable sampling plan. 
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TABLE A 

Cumulative 

binomial 

probability 

distribution 


X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.5584 

0.5277 

0.4984 

0.4704 

0.4437 

0.4182 

0.3939 

0.3707 

0.3487 

0.3277 


1 

0.9035 

0.8875 

0.8708 

0.8533 

0.8352 

0.8165 

0.7973 

0.7776 

0.7576 

0.7373 


2 

0.9888 

0.9857 

0.9821 

0.9780 

0.9734 

0.9682 

0.9625 

0.9563 

0.9495 

0.9421 


3 

0.9993 

0.9991 

0.9987 

0.9983 

0.9978 

0.9971 

0.9964 

0.9955 

0.9945 

0.9933 


4 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\P^ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.3077 

0.2887 

0.2707 

0.2536 

0.2373 

0.2219 

0.2073 

0.1935 

0.1804 

0.1681 


1 

0.7167 

0.6959 

0.6749 

0.6539 

0.6328 

0.6117 

0.5907 

0.5697 

0.5489 

0.5282 


2 

0.9341 

0.9256 

0.9164 

0.9067 

0.8965 

0.8857 

0.8743 

0.8624 

0.8499 

0.8369 


3 

0.9919 

0.9903 

0.9886 

0.9866 

0.9844 

0.9819 

0.9792 

0.9762 

0.9728 

0.9692 


4 

0.9996 

0.9995 

0.9994 

0.9992 

0.9990 

0.9988 

0.9986 

0.9983 

0.9979 

0.9976 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 


0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.1564 

0.1454 

0.1350 

0.1252 

0.1160 

0.1074 

0.0992 

0.0916 

0.0845 

0.0778 


1 

0.5077 

0.4875 

0.4675 

0.4478 

0.4284 

0.4094 

0.3907 

0.3724 

0.3545 

0.3370 


2 

0.8234 

0.8095 

0.7950 

0.7801 

0.7648 

0.7491 

0.7330 

0.7165 

0.6997 

0.6826 


3 

0.9653 

0.9610 

0.9564 

0.9514 

0.9460 

0.9402 

0.9340 

0.9274 

0.9204 

0.9130 


4 

0.9971 

0.9966 

0.9961 

0.9955 

0.9947 

0.9940 

0.9931 

0.9921 

0.9910 

0.9898 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 





TABLE A 

( continued ) 


n = b (continued) 


^Xo 

X X^ 

I 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0715 

0.0656 

0.0602 

0.0551 

0.0503 

0.0459 

0.0418 

0.0380 

0.0345 

0.0312 

1 

0.3199 

0.3033 

0.2871 

0.2714 

0.2562 

0.2415 

0.2272 

0.2135 

0.2002 

0.1375 

2 

0.6651 

0.6475 

0.6295 

0.6114 

0.5931 

0.5747 

0.5561 

0.5375 

0.5187 

0.5000 

3 

0.9051 

0.8967 

0.8879 

0.8786 

0.8688 

0.8585 

0.8478 

0.8365 

0.8247 

0.8125 

4 

0.9884 

0.9869 

0.9853 

0.9835 

0.9815 

0.9794 

0.9771 

0.9745 

0.9718 

0.9688 

5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n = 6 


X 

Xj 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.9415 

0.8858 

0.8330 

0.7828 

0.7351 

0.6899 

0.6470 

0.6064 

0.5679 

0.5314 


1 

0.9985 

0.9943 

0.9875 

0.9784 

0.9672 

0.9541 

0.9392 

0.9227 

0.9048 

0.8857 


2 

1.0000 

0.9998 

0.9995 

0.9988 

0.9978 

0.9962 

0.9942 

0.9915 

0.9882 

0.9841 


3 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9987 


4 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 

X^ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.4970 

0.4644 

0.4336 

0.4046 

0.3771 

0.3513 

0.3269 

0.3040 

0.2824 

0.2621 


1 

0.8655 

0.8444 

0.8224 

0.7997 

0.7765 

0.7528 

0.7287 

0.7044 

0.6799 

0.6554 


2 

0.9794 

0.9739 

0.9676 

0.9605 

0.9527 

0.9440 

0.9345 

0.9241 

0.9130 

0.9011 


3 

0.9982 

0.9975 

0.9966 

0,9955 

0.9941 

0.9925 

0.9906 

0.9884 

0.9859 

0.9830 


4 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

0.9984 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 

\\ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.2431 

0.2252 

0.2084 

0.1927 

0,1780 

0.1642 

0.1513 

0.1393 

0.1281 

0.1176 


1 

0.6308 

0.6063 

0.5820 

0.5578 

0.5339 

0.5104 

0.4872 

0.4644 

0.4420 

0.4202 


2 

0.8885 

0.8750 

0.8609 

0.8461 

0.8306 

0.8144 

0.7977 

0.7804 

0.7626 

0.7443 


3 

0.9798 

0.9761 

0.9720 

0.9674 

0.9624 

0.9569 

0.9508 

0.9443 

0.9372 

0.9295 


4 

0.9980 

0.9975 

0.9969 

0.9962 

0.9954 

0.9944 

0.9933 

0.9921 

0.9907 

0.9891 


5 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

0.9994 

0.9993 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 

\\ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.1079 

0.0989 

0.0905 

0.0827 

0.0754 

0.0687 

0.0625 

0.0568 

0.0515 

0.0467 


1 

0.3988 

0.3780 

0.3578 

0.3381 

0.3191 

0.3006 

0.2828 

0.2657 

0.2492 

0.2333 


2 

0.7256 

0.7064 

0.6870 

0.6672 

0.6471 

0.6268 

0.6063 

0.5857 

0.5650 

0.5443 


3 

0.9213 

0.9125 

0.9031 

0.8931 

0.8826 

0.8714 

0.8596 

0.8473 

0.8343 

0.8208 


4 

0.9873 

0.9852 

0.9830 

0.9805 

0.9777 

0.9746 

0.9712 

0.9675 

0.9635 

0.959 QJ 


5 

0.9991 

0.9989 

0.9987 

0.9985 

0.9982 

0.9978 

0.9974 

0.9970 

0.9965 

0.9959 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'x 

X 


0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


0 

0.0422 

0.0381 

0.0343 

0.0308 

0.0277 

0.0248 

0.0222 

0.0198 

0.0176 

0.0156 


1 

0.2181 

0.2035 

0.1895 

0.1762 

0.1636 

0.1515 

0.1401 

0.1293 

0.1190 

0.1094 


2 i 

0.5236 

0.5029 

0.4823 

0.4618 

0.4415 

0.4214 

0.4015 

0.3820 

0.3627 

0.3437 






TABLE A 

( continued ) 


A7 = 6 (continued) 


X X. 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

3 

0.8067 

0.7920 

0.7768 

0.7610 

0.7447 

0.7280 

0.7107 

0.6930 

0.6748 

0.6562 

4 

0.9542 

0.9490 

0.9434 

0.9373 

0.9308 

0.9238 

0.9163 

0.9083 

0.8997 

0.8906 

5 

0.9952 

0.9945 

0.9937 

0.9927 

0.9917 

0.9905 

0.9892 

0.9878 

0.9862 

0.9844 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n = 7 


XI 

X 


0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.9321 

0.8681 

0.8080 

0.7514 

0.6983 

0.6485 

0.6017 

0.5578 

0.5168 

0.4783 


1 

0.9980 

0.9921 

0.9829 

0.9706 

0.9556 

0.9382 

0.9187 

0.8974 

0.8745 

0.8503 


2 

1.0000 

0.9997 

0.9991 

0.9980 

0.9962 

0.9937 

0.9903 

0.9860 

0.9807 

0.9743 


3 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9993 

0.9988 

0.9982 

0.9973 


4 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\\ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.4423 

0.4087 

0.3773 

0.3479 

0.3206 

0.2951 

0.2714 

0.2493 

0.2288 

0.2097 


1 

0.8250 

0.7988 

0.7719 

0.7444 

0.7166 

0.6885 

0.6604 

0.6323 

0.6044 

0.5767 


2 

0.9669 

0.9584 

0.9487 

0.9380 

0.9262 

0.9134 

0.8995 

0.8846 

0.8687 

0.8520 


3 

0.9961 

0.9946 

0.9928 

0.9906 

0.9879 

0.9847 

0.9811 

0.9769 

0.9721 

0.9667 


4 

0.9997 

0.9996 

0.9994 

0.9991 

0.9988 

0.9983 

0.9978 

0.9971 

0.9963 

0.9953 


5 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 


0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.1920 

0.1757 

0.1605 

0.1465 

0.1335 

0.1215 

0.1105 

0.1003 

0.0910 

0.0824 


1 

0.5494 

0.5225 

0.4960 

0.4702 

0.4449 

0.4204 

0.3965 

0.3734 

0.3510 

0.3294 


2 

0.8343 

0.8159 

0.7967 

0.7769 

0.7564 

0.7354 

0.7139 

0.6919 

0.6696 

0.6471 


3 

0.9606 

0.9539 

0.9464 

0.9383 

0.9294 

0.9198 

0.9095 

0.8984 

0.8866 

0.8740 


4 

0.9942 

0.9928 

0.9912 

0.9893 

0.9871 

0.9847 

0.9819 

0.9787 

0.9752 

0.9712 


5 

0.9995 

0.9994 

0.9992 

0.9989 

0.9987 

0.9983 

0.9979 

0.9974 

0.9969 

0.9962 


6 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\P 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0745 

0.0672 

0.0606 

0.0546 

0.0490 

0.0440 

0.0394 

0.0352 

0.0314 

0.0280 


1 

0.3086 

0.2887 

0.2696 

0.2513 

0.2338 

0.2172 

0.2013 

0.1863 

0.1721 

0.1586 


2 

0.6243 

0.6013 

0.5783 

0.5553 

0.5323 

0.5094 

0.4866 

0.4641 

0.4419 

0.4199 


3 

0.8606 

0.8466 

0.8318 

0.8163 

0.8002 

0.7833 

0.7659 

0.7479 

0.7293 

0.7102 


4 

0.9668 

0.9620 

0.9566 

0.9508 

0.9444 

0.9375 

0.9299 

0.9218 

0.9131 

0.9037 


5 

0.9954 

0.9945 

0.9935 

0.9923 

0.9910 

0.9895 

0.9877 

0.9858 

0.9836 

0.9812 


6 

0.9997 

0.9997 

0.9996 

0.9995 

0.9994 

0.9992 

0.9991 

0.9989 

0.9986 

0.9984 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\\ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


0 

0.0249 

0.0221 

0.0195 

0.0173 

0.0152 

0.0134 

0.0117 

0.0103 

0.0090 

0.0078 


1 

0.1459 

0.1340 

0.1228 

0.1123 

0.1024 

0.0932 

0.0847 

0.0767 

0.0693 

0.0625 


TABLE A 

(i continued) 


n^l (continued) 


X X^ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

2 

0.3983 

0.3771 

0.3564 

0.3362 

0.3164 

0.2973 

0.2787 

0.2607 

0.2433 

0.2266 

3 

0.6906 

0.6706 

0.6502 

0.6294 

0.6083 

0.5869 

0.5654 

0.5437 

0.5219 

0.5000 

4 

0.8937 

0.8831 

0.8718 

0.8598 

0.8471 

0.8337 

0.8187 

0.8049 

0.7895 

0.7734 

5 

0.9784 

0.9754 

0.9721 

0.9684 

0.9643 

0.9598 

0.9549 

0.9496 

0.9438 

0.9375 

6 

0.9981 

0.9977 

0.9973 

0.9968 

0.9963 

0.9956 

0.9949 

0.9941 

0.9932 

0.9922 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n — 8 


\ 

X 

Xx \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.9227 

0.8508 

0.7837 

0.7214 

0.6634 

0.6096 

0.5596 

0.5132 

0.4703 

0.4305 


1 

0.9973 

0.9897 

0.9777 

0.9619 

0.9428 

0.9208 

0.8965 

0.8702 

0.8423 

0.8131 


2 

0.9999 

0.9996 

0.9987 

0.9969 

0.9942 

0.9904 

0.9863 

0.9789 

0.9711 

0.9619 


3 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9993 

0.9987 

0.9978 

0.9966 

0.9950 


4 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 

0.9996 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

*X 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.3937 

0.3596 

0.3282 

0.2992 

0.2725 

0.2479 

0.2252 

0.2044 

0.1853 

0.1678 


1 

0.7829 

0.7520 

0.7206 

0.6889 

0.6572 

0.6256 

0.5943 

0.5634 

0.5330 

0.5033 


2 

0.9513 

0.9392 

0.9257 

0.9109 

0.8948 

0.8774 

0.8588 

0.8392 

0.8185 

0.7969 


3 

0.9929 

0.9903 

0.9871 

0.9832 

0.9786 

0.9733 

0.9672 

0.9603 

0.9524 

0.9437 


4 

0.9993 

0.9990 

0.9985 

0.9979 

0.9971 

0.9962 

0.9956 

0.9935 

0.9917 

0.9896 


5 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9991 

0.9988 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0060 

1.0000 

1.0000 

1.0000 

X 

x*x 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.1517 

0.1370 

0.1236 

0.1113 

0.1001 

0.0899 

0.0806 

0.0722 

0.0646 

0.0576 


i 

0.4743 

0.4462 

0.4189 

0.3925 

0.3671 

0.3427 

0.3193 

0.2969 

0.2756 

0.2553 


2 

0.7745 

0.7514 

0.7276 

0.7033 

0.6785 

0.6535 

0.6282 

0.6027 

0.5772 

0.5518 


3 

0.9341 

0.9235 

0.9120 

0.8996 

0.8862 

0.8719 

0.8567 

0.8406 

0.8237 

0.8059 


4 

0.9871 

0.9842 

0.9809 

0.9770 

0.9727 

0.9678 

0.9623 

0.9562 

0.9495 

0.9420 


5 

0.9984 

0.9979 

0.9973 

0.9966 

0.9958 

0.9948 

0.9936 

0.9922 

0.9906 

0.9887 


6 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

0.9994 

0.9992 

0.9996 

0.9987 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0006 

1.0000 

0.9999 

0.9999 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0006 

1.0000 

1.0000 

1.0000 

'x 

X 

\\ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0514 

0.0457 

0.0406 

0.0360 

0.0319 

0.0281 

0.0248 

0.0218 

0.0192 

0.0168 


1 

0.2360 

0.2178 

0.2006 

0.1844 

0.1691 

0.1548 

0.1414 

0.1289 

0.1172 

0.1064 


2 

0.5264 

0.5013 

0.4764 

0.4519 

0.4278 

0.4042 

0.3811 

0.3585 

0.3366 

0.3154 


3 

0.7874 

0.7681 

0.7481 

0.7276 

0.7064 

0.6847 

0.6626 

0.6401 

0.6172 

0.5941 


4 

0.9339 

0.9250 

0.9154 

0.9051 

0.8939 

0.8820 

0.8693 

0.8557 

0.8414 

0.8263 


5 

0.9866 

0.9841 

0.9813 

0.9782 

0.9747 

0.9707 

0.9664 

0.9615 

0.9561 

0.9502 


6 

0.9984 

0.9980 

0.9976 

0.9970 

0.9964 

0.9957 

0.9949 

0.9939 

0.9928 

0.9915 


7 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9996 

0.9995 

0.9993 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 





TABLE A 

(continued) 


n = 8 (continued) 


\p 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0147 

0.0128 

0.0111 

0.0097 

0.0084 

0.0072 

0.0062 

0.0053 

0.0046 

0.0039 

1 

0.0963 

0.0870 

0.0784 

0.0705 

0.0632 

0.0565 

0.0504 

0.0448 

0.0398 

0.0352 

2 

0.2948 

0.2750 

0.2560 

0.2376 

0.2201 

0.2034 

0.1875 

0.1724 

0.1581 

0.1445 

3 

0.5708 

0.5473 

0.5238 

0.5004 

0.4770 

0.4537 

0.4306 

0.4078 

0.3854 

0.3633 

4 

0.8105 

0.7938 

0.7765 

0.7584 

0.7396 

0.7202 

0.7001 

0.6795 

0.6584 

0.6367 

5 

0.9437 

0.9366 

0.9289 

0.9206 

0.9115 

0.9018 

0.8914 

0.8802 

0.8682 

0.8555 

6 

0.9900 

0.9883 

0.9864 

0.9843 

0.9819 

0.9792 

0.9761 

0.9728 

0.9690 

0.9648 

7 

0.9992 

0.9990 

0.9988 

0.9986 

0.9983 

0.9980 

0.9976 

0.9972 

0.9967 

0.9961 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/? = 9 


X 


0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.9135 

0.8337 

0.7602 

0.6925 

0.6302 

0.5730 

0.5204 

0.4722 

0.4279 

0.3874 


1 

0.9966 

0.9869 

0.9718 

0.9522 

0.9288 

0.9022 

0.8729 

0.8417 

0.8088 

0.7748 


2 

0.9999 

0.9994 

0.9980 

0.9955 

0.9916 

0.9862 

0.9791 

0.9702 

0.9595 

0.9470 


3 

1.0000 

1.0000 

0.9999 

0.9997 

0.9994 

0.9987 

0.9977 

0.9963 

0.9943 

0.9917 


4 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

0.9995 

0.9991 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 1 

0.3504 

0.3165 

0.2855 

0.2573 

0.2316 

0.2082 

0.1869 

0.1676 

0.1501 

0.1342 


1 

0.7401 

0.7049 

0.6696 

0.6343 

0.5995 

0.5652 

0.5315 

0.4988 

0.4670 

0.4362 


2 

0.9327 

0.9167 

0.8991 

0.8798 

0.8591 

0.8371 

0.8139 

0.7895 

0.7643 

0.7382 


3 

0.9883 

0.9842 

0.9791 

0.9731 

0.9661 

0.9580 

0.9488 

0.9385 

0.9270 

0.9144 


4 

0.9986 

0.9979 

0.9970 

0.9959 

0.9944 

0.9925 

0.9902 

0.9875 

0.9842 

0.9804 


5 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9987 

0.9983 

0.9977 

0.9969 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

\\ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.1199 

0.1069 

0.0952 

0.0846 

0.0751 

0.0665 

0.0589 

0.0520 

0.0458 

0.0404 


1 

0.4066 

0.3782 

0.3509 

0.3250 

0.3003 

0.2770 

0.2548 

0.2340 

0.2144 

0.1960 


2 

0.7115 

0.6842 

0.6566 

0.6287 

0.6007 

0.5727 

0.5448 

0.5171 

0.4898 

0.4628 


3 

0.9006 

0.8856 

0.8696 

0.8525 

0.8343 

0.8151 

0.7950 

0.7740 

0.7522 

0.7297 


4 

0.9760 

0.9709 

0.9650 

0.9584 

0.9511 

0.9429 

0.9338 

0.9238 

0.9130 

0.9012 


5 

0.9960 

0.9949 

0.9935 

0.9919 

0.9900 

0.9878 

0.9851 

0.9821 

0.9787 

0.9747 


6 

0.9996 

0.9994 

0.9992 

0.9990 

0.9987 

0.9983 

0.9978 

0.9972 

0.9965 

0.9957 


7 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9997 

0.9996 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

^z 

X 

\\ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0355 

0.0311 

0.0272 

0.0238 

0.0207 

0.0180 

0.0156 

0.0135 

0.0117 

0.0101 


1 

0.1788 

0.1628 

0.1478 

0.1339 

0.1211 

0.1092 

0.0983 

0.0882 

0.0790 

0.0705 


2 

0.4364 

0.4106 

0.3854 

0.3610 

0.3373 

0.3144 

0.2924 

0.2713 

0.2511 

0.2318 


3 

0.7065 

0.6827 

0.6585 

0.6338 

0.6089 

0.5837 

0.5584 

0.5331 

0.5078 

0.4826 


4 

0.8885 

0.8748 

0.8602 

0.8447 

0.8283 

0.8110 

0.7928 

0.7738 

0.7540 

0.7334 


5 

0.9702 

0.9652 

0.9596 

0.9533 

0.9464 

0.9388 

0.9304 

0.9213 

0.9114 

0.9006 


6 

0.9947 

0.9936 

0.9922 

0.9906 

0.9888 

0.9867 

0.9843 

0.9816 

0.9785 

0.9750 



TABLE A 

{continued) 


/7 = 9 ( continued ) 


X 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

7 

0.9994 

0.9993 

0.9991 

0.9989 

0.9986 

0.9983 

0.9979 

0.9974 

0.9969 

0.9962 

8 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\.P 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0087 

0.0074 

0.0064 

0.0054 

0.0046 

0.0039 

0.0033 

0.0028 

0.0023 

0.0020 

1 

0.0628 

0.0558 

0.0495 

0.0437 

0.0385 

0.0338 

0.0296 

0.0259 

0.0225 

0.0195 

2 

0.2134 

0.1961 

0.1796 

0.1641 

0.1495 

0.1358 

0.1231 

0.1111 

0.1001 

0.0898 

3 

0.4576 

0.4330 

0.4087 

0.3848 

0.3614 

0.3386 

0.3164 

0.2948 

0.2740 

0.2539 

4 

0.7122 

0.6903 

0.6678 

0.6449 

0.6214 

0.5976 

0.5735 

0.5491 

0.5246 

0.5000 

5 

0.8891 

0.8767 

0.8634 

0.8492 

0.8342 

0.8183 

0.8015 

0.7839 

0.7654 

0.7461 

6 

0.9710 

0.9666 

0.9617 

0.9563 

0.9502 

0.9436 

0.9363 

0.9283 

0.9196 

0.9102 

7 

0.9954 

0.9945 

0.9935 

0.9923 

0.9909 

0.9893 

0.9875 

0.9855 

0.9831 

0.9805 

8 

0.9997 

0.9996 

0.9995 

0.9994 

0.9992 

0.9991 

0.9989 

0.9986 

0.9984 

0.9980 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/? — 10 


X 


0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.9044 

0.8171 

0.7374 

0.6648 

0.5987 

0.5386 

0.4840 

0.4344 

0.3894 

0.3487 


1 

0.9957 

0.9838 

0.9655 

0.9418 

0.9139 

0.8824 

0.8483 

0.8121 

0.7746 

0.7361 


2 

0.9999 

0.9991 

0.9972 

0.9938 

0.9885 

0.9812 

0.9717 

0.9599 

0.9460 

0.9298 


3 

1.0000 

1.0000 

0.9999 

0.9996 

0.9990 

0.9980 

0.9964 

0.9942 

0.9912 

0.9872 


4 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

0.9994 

0.9990 

0.9984 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'X 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.3118 

0.2785 

0.2484 

0.2213 

0.1969 

0.1749 

0.1552 

0.1374 

0.1216 

0.1074 


1 

0.6972 

0.6583 

0.6196 

0.5816 

0.5443 

0.5080 

0.4730 

0.4392 

0.4068 

0.3758 


2 

0.9116 

0.8913 

0.8692 

0.8455 

0.8202 

0.7936 

0.7659 

0.7372 

0.7078 

0.6778 


3 

0.9822 

0.9761 

0.9687 

0.9600 

0.9500 

0.9386 

0.9259 

0.9117 

0.8961 

0.8791 


4 

0.9975 

0.9963 

0.9947 

0.9927 

0.9901 

0.9870 

0.9832 

0.9787 

0.9734 

0.9672 


5 

0.9997 

0.9996 

0.9994 

0.9990 

0.9986 

0.9980 

0.9973 

0.9963 

0.9951 

0.9936 


6 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'x 

X 


0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0947 

0.0834 

0.0733 

0.0643 

0.0563 

0.0492 

0.0430 

0.0374 

0.0326 

0.0282 


1 

0.3464 

0.3185 

0.2921 

0.2673 

0.2440 

0.2222 

0.2019 

0.1830 

0.1655 

0.1493 


2 

0.6474 

0.6169 

0.5863 

0.5558 

0.5256 

0.4958 

0.4665 

0.4378 

0.4099 

0.3328 


3 

0.8609 

0.8413 

0.8206 

0.7988 

0.7759 

0.7521 

0.7274 

0.7021 

0.6761 

0.6496 


4 

0.9601 

0.9521 

0.9431 

0.9330 

0.9219 

0.9096 

0.8963 

0.8819 

0.8663 

0.8497 


5 

0.9918 

0.9896 

0.9870 

0.9839 

0.9803 

0.9761 

0.9713 

0.9658 

0.9596 

0.9527 


6 

0.9988 

0.9984 

0.9979 

0.9973 

0.9965 

0.9955 

0.9944 

0.9930 

0.9913 

0.9894 


7 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9994 

0.9993 

0.9990 

0.9988 

0.9984 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

( continued ) 


/? — 10 (continued) 


\p 

X X^ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0245 

0.0211 

0.0182 

0.0157 

0.0135 

0.0115 

0.0098 

0.0084 

0.0071 

0.0060 

1 

0.1344 

0.1206 

0.1080 

0.0965 

0.0860 

0.0764 

0.0677 

0.0598 

0.0527 

0.0464 

2 

0.3566 

0.3313 

0.3070 

0.2838 

0.2616 

0.2405 

0.2206 

0.2017 

0.1840 

0.1673 

3 

0.6228 

0.5956 

0.5684 

0.5411 

0.5138 

0.4868 

0.4600 

0.4336 

0.4077 

0.3823 

4 

0.8321 

0.8133 

0.7936 

0.7730 

0.7515 

0.7292 

0.7061 

0.6823 

0.6580 

0.6331 

5 

0.9449 

0.9363 

0,9268 

0.9164 

0.9051 

0.8928 

0.8795 

0.8652 

0.8500 

0.8338 

6 

0.9871 

0.9845 

0.9815 

0.9780 

0.9740 

0.9695 

0.9644 

0.9587 

0.9523 

0.9452 

7 

0.9980 

0.9975 

0.9968 

0.9961 

0.9952 

0.9941 

0.9929 

0.9914 

0.9897 

0.9877 

8 

0.9998 

0.9997 

0.9997 

0.9996 

0,9995 

0.9993 

0.9991 

0.9989 

0.9986 

0.9983 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 ■ 

0 

0.0051 

0.0043 

0.0036 

0.0030 

0.0025 

0.0021 

0.0017 

0.0014 

0.0012 

0.0010 

1 

0.0406 

0.0355 

0.0309 

0.0269 

0.0233 

0.0201 

0.0173 

0.0148 

0.0126 

0.0107 

2 

0.1517 

0.1372 

0.1236 

0.1111 

0.0996 

0.0889 

0.0791 

0.0702 

0.0621 

0.0547 

3 

0.3575 

0.3335 

0.3102 

0.2877 

0.2660 

0.2453 

0.2255 

0.2067 

0.1888 

0.1719 

4 

0.6078 

0.5822 

0.5564 

0.5304 

0.5044 

0.4784 

0.4526 

0.4270 

0.4018 

0.3770 

5 

0.8166 

0.7984 

0.7793 

0.7593 

0.7384 

0.7168 

0.6943 

0.6712 

0.6474 

0.6230 

6 

0.9374 

0.9288 

0.9194 

0.9092 

0.8980 

0.8859 

0.8729 

0.8590 

0.8440 

0.8281 

7 

0.9854 

0.9828 

0.9798 

0.9764 

0.9726 

0.9683 

0.9634 

0.9580 

0.9520 

0.9453 

8 

0.9979 

0.9975 

0.9969 

0.9963 

0.9955 

0.9946 

0.9935 

0.9923 

0.9909 

0.9893 

9 

0.9999 

0.9998 

0.9998 

0.9997 

0.9997 

0.9996 

0.9995 

0.9994 

0.9992 

0.9990 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n = V 


X 

X 

-! 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.8953 

0.8007 

0.7153 

0.6382 

0.5688 

0.5063 

0.4501 

0.3996 

0.3544 

0.3138 


1 

0.9948 

0.9805 

0.9587 

0.9308 

0.8981 

0.8618 

0.8228 

0.7819 

0.7399 

0.6974 


2 

0.9998 

0.9988 

0.9963 

0.9917 

0.9848 

0.9752 

0.9630 

0.9481 

0.9305 

0.9104 


3 

1.0000 

1.0000 

0.9998 

0.9993 

0.9984 

0.9970 

0.9947 

0.9915 

0.9871 

0.9815 


4 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9995 

0.9990 

0.9983 

0.9972 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\\ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.2775 

0.2451 

0.2161 

0.1903 

0.1673 

0.1469 

0.1288 

0.1127 

0.0985 

0.0859 


1 

0.6548 

0.6127 

0.5714 

0.5311 

0.4922 

0.4547 

0.4189 

0.3849 

0.3526 

0.3221 


2 

0.8880 

0.8634 

0.8368 

0.8085 

0.7788 

0.7479 

0.7161 

0.6836 

0.6506 

0.6174 


3 

0.9744 

0.9659 

0.9558 

0.9440 

0.9306 

0.9154 

0.8987 

0.8803 

0.8603 

0.8389 


4 

0.9958 

0.9939 

0.9913 

0.9881 

0.9841 

0.9793 

0.9734 

0.9666 

0.9587 

0.9496 


5 

0.9995 

0.9992 

0.9988 

0.9982 

0.9973 

0.9963 

0.9949 

0.9932 

0.9910 

0.9883 


6 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9993 

0.9990 

0.9986 

0.9980 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

X^ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0748 

0.0650 

0.0564 

0.0489 

0.0422 

0.0364 

0.0314 

0.0270 

0.0231 

0.0198 


1 

0.2935 

0.2667 

0.2418 

0.2186 

0.1971 

0.1773 

0.1590 

0.1423 

0.1270 

0.1130 



TABLE A 

(continued) 


n = 11 (continued) 


X \ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

2 

0.5842 

0.5512 

0.5186 

0.4866 

0.4552 

0.4247 

0.3951 

0.3665 

0.3390 

0.3127 

3 

0.8160 

0.7919 

0.7667 

0.7404 

0.7133 

0.6854 

0.6570 

0.6281 

0.5989 

0.5696 

4 

0.9393 

0.9277 

0.9149 

0.9008 

0.8854 

0.8687 

0.8507 

0.8315 

0.8112 

0.7897 

5 

0.9852 

0.9814 

0.9769 

0.9717 

0.9657 

0.9588 

0.9510 

0.9423 

0.9326 

0.9218 

6 

0.9973 

0.9965 

0.9954 

0.9941 

0.9924 

0.9905 

0.9881 

0.9854 

0.9821 

0.9784 

7 

0.9997 

0.9995 

0.9993 

0.9991 

0.9988 

0.9984 

0.9979 

0.9973 

0.9966 

0.9957 

8 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9994 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 
x \ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0169 

0.0144 

0.0122 

0.0104 

0.0088 

0.0074 

0.0062 

0.0052 

0.0044 

0.0036 

1 

0.1003 

0.0888 

0.0784 

0.0690 

0.0606 

0.0530 

0.0463 

0.0403 

0.0350 

0.0302 

2 

0.2877 

0.2639 

0,2413 

0.2201 

0.2001 

0.1814 

0.1640 

0.1478 

0.1328 

0.1189 

3 

0.5402 

0.5110 

0.4821 

0.4536 

0.4256 

0.3981 

0.3714 

0.3455 

0.3204 

0.2963 

4 

0.7672 

0.7437 

0.7193 

0.6941 

0.6683 

0.6419 

0.6150 

0.5878 

0.5603 

0.5328 

5 

0.9099 

0.8969 

0.8829 

0.8676 

0.8513 

0.8339 

0.8153 

0.7957 

0.7751 

0.7535 

6 

0.9740 

0.9691 

0.9634 

0.9570 

0.9499 

0.9419 

0.9330 

0.9232 

0.9124 

0.9006 

7 

0.9946 

0.9933 

0.9918 

0.9899 

0.9878 

0.9852 

0.9823 

0.9790 

0.9751 

0.9707 

8 

0.9992 

0.9990 

0.9987 

0.9984 

0.9980 

0.9974 

0.9968 

0.9961 

0.9952 

0.9941 

9 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

0.9994 

0.9993 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x x s v 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0030 

0.0025 

0.0021 

0.0017 

0.0014 

0.0011 

0.0009 

0.0008 

0.0006 

0.0005 

1 

0.0261 

0.0224 

0.0192 

0.0164 

0.0139 

0.0118 

0.0100 

0.0084 

0.0070 

0.0059 

2 

0.1062 

0.0945 

0.0838 

0.0740 

0.0652 

0.0572 

0.0501 

0.0436 

0.0378 

0.0327 

3 

0.2731 

0.2510 

0.2300 

0.2100 

0.1911 

0.1734 

0.1567 

0.1412 

0.1267 

0.1133 

4 

0.5052 

0.4777 

0.4505 

0.4236 

0.3971 

0.3712 

0.3459 

0.3213 

0.2974 

0.2744 

5 

0.7310 

0.7076 

0.6834 

0.6586 

0.6331 

0.6071 

0.5807 

0.5540 

0.5271 

0.5000 

6 

0.8879 

0.8740 

0.8592 

0.8432 

0,8262 

0.8081 

0.7890 

0.7688 

0.7477 

0.7256 

7 

0.9657 

0.9601 

0.9539 

0.9468 

0.9390 

0.9304 

0.9209 

0.9105 

0.8991 

0.8867 

8 

0.9928 

0.9913 

0.9896 

0.9875 

0.9852 

0.9825 

0.9794 

0.9759 

0.9718 

0.9673 

9 

0.9991 

0.9988 

0.9986 

0.9982 

0.9978 

0.9973 

0.9967 

0.9960 

0.9951 

0.9941 

10 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/?= 12 


x \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8864 

0.7847 

0.6938 

0.6127 

0.5404 

0.4759 

0.4186 

0.3677 

0.3225 

0.2824 

1 

0.9938 

0.9769 

0.9514 

0.9191 

0.8816 

0.8405 

0.7967 

0.7513 

0.7052 

0.6590 

2 

0.9998 

0.9985 

0.9952 

0.9893 

0.9804 

0.9684 

0.9532 

0.9348 

0.9134 

0.8891 

3 

1.0000 

0.9999 

0.9997 

0.9990 

0.9978 

0.9957 

0.9925 

0.9880 

0.9820 

0.9744 

4 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9991 

0.9984 

0.9973 

0.9957 

5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0,9999 

0.9998 

0.9997 

0.9995 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.2470 

0.2157 

0.1880 

0.1637 

0.1422 

0.1234 

0.1069 

0.0924 

0.0798 

0.0687 






TABLE A 

( continued ) 


n-M (continued) 


X 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


1 

0.6133 

0.5686 

0.5252 

0.4834 

0.4435 

0.4055 

0.3696 

0.3359 

0.3043 

0.2749 


2 

0.8623 

0.8333 

0.8023 

0.7697 

0.7358 

0.7010 

0.6656 

0.6298 

0.5940 

0.5583 


3 

0.9649 

0.9536 

0.9403 

0.9250 

0.9078 

0.8886 

0.8676 

0.8448 

0.8205 

0.7946 


4 

0.9935 

0.9905 

0.9867 

0.9819 

0.9761 

0.9690 

0.9607 

0.9511 

0.9400 

0.9274 


5 

0.9991 

0.9986 

0.9978 

0.9967 

0.9954 

0.9935 

0.9912 

0.9884 

0.9849 

0.9806 


6 

0.9999 

0.9998 

0.9997 

0.9996 

0.9993 

0.9990 

0.9985 

0.9979 

0.9971 

0.9961 


7 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 

\ P ^ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0591 

0.0507 

0.0434 

0.0371 

0.0317 

0.0270 

0.0229 

0.0194 

0.0164 

0.0138 


1 

0.2476 

0.2224 

0.1991 

0.1778 

0.1584 

0.1406 

0.1245 

0.1100 

0.0968 

0.0850 


2 

0.5232 

0.4886 

0.4550 

0.4222 

0.3907 

0.3603 

0.3313 

0.3037 

0.2775 

0.2528 


3 

0.7674 

0.7390 

0.7096 

0.6795 

0.6488 

0.6176 

0.5863 

0.5548 

0.5235 

0.4925 


4 

0.9134 

0.8979 

0.8808 

0.8623 

0.8424 

0.8210 

0.7984 

0.7746 

0.7496 

0.7237 


5 

0.9755 

0.9696 

0.9626 

0.9547 

0.9456 

0.9354 

0.9240 

0.9113 

0.8974 

0.8822 


6 

0.9948 

0.9932 

0.9911 

0.9887 

0.9857 

0.9822 

0.9781 

0.9733 

0.9678 

0.9614 


7 

0.9992 

0.9989 

0.9984 

0.9979 

0.9972 

0.9964 

0.9953 

0.9940 

0.9924 

0.9905 


8 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

0.9983 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'X 

X 

X| 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0116 

0.0098 

0.0082 

0.0068 

0.0057 

0.0047 

0.0039 

0.0032 

0.0027 

0.0022 


i 1 

0.0744 

0.0650 

0.0565 

0.0491 

0.0424 

0.0366 

0.0315 

0.0270 

0.0230 

0.0196 


2 

0.2296 

0.2078 

0.1876 

0.1687 

0.1513 

0.1352 

0.1205 

0.1069 

0.0946 

0.0834 


3 

0.4619 

0.4319 

0.4027 

0.3742 

0.3467 

0.3201 

0.2947 

0.2704 

0.2472 

0.2253 


4 

0.6968 

0.6692 

0.6410 

0.6124 

0.5833 

0.5541 

0.5249 

0.4957 

0.4668 

0.4382 


5 

0.8657 

0.8479 

0.8289 

0.8087 

0.7873 

0.7648 

0.7412 

0.7167 

0.6913 

0.6652 


6 

0.9542 

0.9460 

0.9368 

0.9266 

0.9154 

0.9030 

0.8894 

0.8747 

0.8589 

0.8418 


7 

0.9882 

0.9856 

0.9824 

0.9787 

0.9745 

0.9696 

0.9641 

0.9578 

0.9507 

0.9427 


8 

0.9978 

0.9972 

0.9964 

0.9955 

0.9944 

0.9930 

0.9915 

0.9896 

0.9873 

0.9847 


9 

0.9997 

0.9996 

0.9995 

0.9993 

0.9992 

0.9989 

0.9986 

0.9982 

0.9978 

0.9972 


10 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'X 

X 

\\ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


0 

0.0018 

0.0014 

0.0012 

0.0010 

0.0008 

0.0006 

0.0005 

0.0004 

0.0003 

0.0002 


1 

0.0166 

0.0140 

0.0118 

0.0099 

0.0083 

0.0069 

0.0057 

0.0047 

0.0039 

0.0032 


2 

0.0733 

0.0642 

0.0560 

0.0487 

0.0421 

0.0363 

0.0312 

0.0267 

0.0227 

0.0193 


3 

0.2047 

0.1853 

0.1671 

0.1502 

0.1345 

0.1199 

0.1066 

0.0943 

0.0832 

0.0730 


4 

0.4101 

0.3825 

0.3557 

0.3296 

0.3044 

0.2802 

0.2570 

0.2348 

0.2138 

0.1938 


5 

0.6384 

0.6111 

0.5833 

0.5552 

0.5269 

0.4986 

0.4703 

0.4423 

0.4145 

0.3872 


6 

0.8235 

0.8041 

0.7836 

0.7620 

0.7393 

0.7157 

0.6911 

0.6657 

0.6396 

0.6128 


7 

0.9338 

0.9240 

0.9131 

0.9012 

0.8883 

0.8742 

0.8589 

0.8425 

0.8249 

0.8062 


8 

0.9817 

0.9782 

0.9742 

0.9696 

0.9644 

0.9585 

0.9519 

0.9445 

0.9362 

0.9270 


9 

0.9965 

0.9957 

0.9947 

0.9935 

0.9921 

0.9905 

0.9886 

0.9863 

0.9837 

0.9807 


10 

0.9996 

0.9995 

0.9993 

0.9991 

0.9989 

0.9986 

0.9983 

0.9979 

0.9974 

0.9968 


11 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

( continued ) 


n = 13 


X 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8775 

0.7690 

0.6730 

0.5882 

0.5133 

0.4474 

0.3893 

0.3383 

0.2935 

0.2542 

1 

0.9928 

0.9730 

0.9436 

0.9068 

0.8646 

0.8186 

0.7702 

0.7206 

0.6707 

0.6213 

2 

0.9997 

0.9980 

0.9938 

0.9865 

0.9755 

0.9608 

0.9422 

0.9201 

0.8946 

0.8661 

3 

1.0000 

0.9999 

0.9995 

0.9986 

0.9969 

0.9940 

0.9897 

0.9837 

0.9758 

0.9658 

4 

1.0000 

1.0000 

1.0000 

0,9999 

0.9997 

0.9993 

0 . 99§7 

0.9976 

0.9959 

0.9935 

5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 

0.9995 

0.9991 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

7 

1.0000 

1.0000 

1.0000 

1,0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.2198 

0.1898 

0.1636 

0.1408 

0.1209 

0.1037 

0.0887 

0.0758 

0.0646 

0.0550 

1 

0.5730 

0.5262 

0.4814 

0.4386 

0.3983 

0.3604 

0.3249 

0.2920 

0.2616 

0.2336 

2 

0.8349 

0.8015 

0.7663 

0.7296 

0.6920 

0.6537 

0.6152 

0.5769 

0.5389 

0.5017 

3 

0.9536 

0.9391 

0.9224 

0.9033 

0.8820 

0.8586 

0.8333 

0.8061 

0.7774 

0.7473 

4 

0.9903 

0.9861 

0.9807 

0.9740 

0.9658 

0.9562 

0.9449 

0.9319 

0.9173 

0.9009 

5 

0.9985 

0.9976 

0.9964 

0.9947 

0.9925 

0.9896 

0.9861 

0.9817 

0.9763 

0.9700 

6 

0.9998 

0.9997 

0.9995 

0.9992 

0.9987 

0.9981 

0.9973 

0.9962 

0.9948 

0.9930 

7 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9988 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

>Ss \ P 

x 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0467 

0.0396 

0.0334 

0.0282 

0.0238 

0.0200 

0.0167 

0.0140 

0.0117 

0.0097 

1 

0.2080 

0.1846 

0.1633 

0.1441 

0.1267 

0.1111 

0.0971 

0.0846 

0.0735 

0.0637 

2 

0.4653 

0.4301 

0.3961 

0.3636 

0.3326 

0.3032 

0 . 275 % 

0.2495 

0.2251 

0.2025 

3 

0.7161 

0.6839 

0.6511 

0.6178 

0.5843 

0.5507 

0.5174 

0.4845 

0.4522 

0.4206 

4 

0.8827 

0.8629 

0.8415 

0.8184 

0.7940 

0.7681 

0.7411 

0.7130 

0.6840 

0.6543 

5 

0.9625 

0.9538 

0.9438 

0.9325 

0.9198 

0.9056 

0.8901 

0.8730 

0.8545 

0.8346 

6 

0.9907 

0.9880 

0.9846 

0.9805 

0.9757 

0.9701 

0.9635 

0.9560 

0.9473 

0.9376 

7 

0.9983 

0.9976 

0.9968 

0.9957 

0.9944 

0.9927 

0.9907 

0.9882 

0.9853 

0.9818 

8 

0.9998 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

0.9982 

0.9976 

0.9969 

0.9960 

9 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

i.ooob 

1.0000 

1.0000 

1.0000 

\P 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0080 

0.0066 

0.0055 

0.0045 

0.0037 

0.0030 

0.0025 

0.0020 

0.0016 

0.0013 

1 

0.0550 

0.0473 

0.0406 

0.0347 

0.0296 

0.0251 

0.0213 

0.0179 

0.0151 

0.0126 

2 

0.1815 

0.1621 

0.1443 

0.1280 

0.1132 

0.0997 

0.0875 

0.0765 

0.0667 

0.0579 

3 

0.3899 

0.3602 

0.3317 

0.3043 

0.2783 

0.2536 

0.2302 

0.2083 

0.1877 

0.1686 

4 

0.6240 

0.5933 

0.5624 

0.5314 

0.5005 

0.4699 

0.4397 

0.4101 

0.3812 

0.3530 

5 

0.8133 

0.7907 

0.7669 

0.7419 

0.7159 

0.6889 

0.6612 

0.6327 

0.6038 

0.5744 

6 

0.9267 

0.9146 

0.9012 

0.8865 

0.8705 

0.8532 

0.8346 

0.8147 

0.7935 

0.7712 

7 

0.9777 

0.9729 

0.9674 

0.9610 

0.9538 

0.9456 

0.9365 

0.9262 

0.9149 

0.9023 

8 

0.9948 

0.9935 

0.9918 

0.9898 

0.9874 

0.9846 

0.9813 

0.9775 

0.9730 

0.9679 

9 

0.9991 

0.9988 

0.9985 

0.9980 

0.9975 

0.9968 

0.9960 

0.9949 

0.9937 

0.9922 

10 

0.9999 

0.9999 

0.9998 

0.9997 

0.9997 

0.9995 

0.9994 

0.9992 

0.9990 

0.9987 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


TABLE A 

( continued ) 


n = 13 ( continued ) 


x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0,49 

0.50 

0 

0.0010 

0.0008 

0.0007 

0.0005 

0.0004 

0.0003 

0.0003 

0.0002 

0.0002 

0.0001 

1 

0.0105 

0.0088 

0.0072 

0.0060 

0.0049 

0.0040 

0.0033 

0.0026 

0.0021 

0.0017 

2 

0.0501 

0.0431 

0.0370 

0.0316 

0.0269 

0.0228 

0.0192 

0.0162 

0.0135 

0.0112 

3 

0.1508 

0.1344 

0.1193 

0.1055 

0.0929 

0.0815 

0.0712 

0.0619 

0.0536 

0.0461 

4 

0.3258 

0.2997 

0.2746 

0.2507 

0.2279 

0.2065 

0.1863 

0.1674 

0.1498 

0.1334 

5 

0.5448 

0.5151 

0.4854 

0.4559 

0.4268 

0.3981 

0.3701 

0.3427 

0.3162 

0.2905 

6 

0.7476 

0.7230 

0.6975 

0.6710 

0.6437 

0.6158 

0.5873 

0.5585 

0.5293 

0.5000 

7 

0.8886 

0.8736 

0.8574 

0.8400 

0.8212 

0.8012 

0.7800 

0.7576 

0.7341 

0.7095 

8 

0.9621 

0.9554 

0.9480 

0.9395 

0.9302 

0.9197 

0.9082 

0.8955 

0.8817 

0.8666 

9 

0.9904 

0.9883 

0.9859 

0.9830 

0.9797 

0.9758 

0.9713 

0.9662 

0.9604 

0.9539 

10 

0.9983 

0.9979 

0.9973 

0.9967 

0.9959 

0.9949 

0.9937 

0.9923 

0.9907 

0.9888 

11 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9991 

0.9989 

0.9986 

0.9983 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n = 14 


X 

\\ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.8687 

0.7536 

0.6528 

0.5647 

0.4877 

0.4205 

0.3620 

0.3112 

0.2670 

0.2288 


1 

0.9916 

0.9690 

0.9355 

0.8941 

0.8470 

0.7963 

0.7436 

0.6900 

0.6368 

0.5846 


2 

0.9997 

0.9975 

0.9923 

0.9833 

0.9699 

0.9522 

0.9302 

0.9042 

0.8745 

0.8416 


3 

1.0000 

0.9999 

0.9994 

0.9981 

0.9958 

0.9920 

0.9864 

0.9786 

0.9685 

0.9559 


4 

1.0000 

1.0000 

1.0000 

0.9998 

0.9996 

0.9990 

0.9980 

0.9965 

0.9941 

0.9908 


5 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9992 

0.9985 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'X 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.1956 

0.1670 

0.1423 

0.1211 

0.1028 

0.0871 

0.0736 

0.0621 

0.0523 

0.0440 


1 

0.5342 

0.4859 

0.4401 

0.3969 

0.3567 

0.3193 

0.2848 

0.2531 

0.2242 

0.1979 


2 

0.8061 

0.7685 

0.7292 

0.6889 

0.6479 

0.6068 

0.5659 

0.5256 

0.4862 

0.4481 


3 

0.9406 

0.9226 

0.9021 

0.8790 

0.8535 

0.8258 

0.7962 

0.7649 

0.7321 

0.6982 


4 

0.9863 

0.9804 

0.9731 

0.9641 

0.9533 

0.9406 

0.9259 

0.9093 

0.8907 

0.8702 


5 

0.9976 

0.9962 

0.9943 

0.9918 

0.9885 

0.9843 

0.9791 

0.9727 

0.9651 

0.9561 


6 

0.9997 

0.9994 

0.9991 

0.9985 

0.9978 

0.9968 

0.9954 

0.9936 

0.9913 

0.9884 


7 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9988 

0.9983 

0.9976 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\\ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0369 

0.0309 

0.0258 

0.0214 

0.0178 

0.0148 

0.0122 

0.0101 

0.0083 

0.0068 


1 

0.1741 

0.1527 

0.1335 

0.1163 

0.1010 

0.0874 

0.0754 

0.0648 

0.0556 

0.0475 


2 

0.4113 

0.3761 

0.3426 

0.3109 

0.2811 

0.2533 

0.2273 

0.2033 

0.1812 

0.1608 


3 

0.6634 

0.6281 

0.5924 

0.5568 

0.5213 

0.4864 

0.4521 

0.4187 

0.3863 

0.3552 


4 

0.8477 

0.8235 

0.7977 

0.7703 

0.7415 

0.7116 

0.6807 

0.6490 

0.6168 

0.5842 


5 

0.9457 

0.9338 

0.9203 

0.9051 

0.8883 

0.8699 

0.8498 

0.8282 

0.8051 

0.7805 


6 

0.9848 

0.9804 

0.9752 

0.9690 

0.9617 

0.9533 

0.9437 

0.9327 

0.9204 

0.9067 


7 

0.9967 

0.9955 

0.9940 

0.9921 

0.9897 

0.9868 

0.9833 

0.9792 

0.9743 

0.9685 


8 

0.9994 

0.9992 

0.9989 

0.9984 

0.9978 

0.9971 

0.9962 

0.9950 

0.9935 

0.9917 


9 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9991 

0.9988 

0.9983 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

{continued) 


n= 14 (continued) 


X 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0055 

0.0045 

0.0037 

0.0030 

0.0024 

0.0019 

0.0016 

0.0012 

0.0010 

0.0008 

1 

0.0404 

0.0343 

0.0290 

0.0244 

0.0205 

0.0172 

0.0143 

0.0119 

0.0098 

0.0081 

2 

0.1423 

0.1254 

0.1101 

0.0963 

0.0839 

0.0729 

0.0630 

0.0543 

0.0466 

0.0398 

3 

0.3253 

0.2968 

0.2699 

0.2444 

0.2205 

0.1982 

0.1774 

0.1582 

0.1405 

0.1243 

4 

0.5514 

0.5187 

0.4862 

0.4542 

0.4227 

0.3920 

0.3622 

0.3334 

0.3057 

0.2793 

5 

0.7546 

0.7276 

0.6994 

0.6703 

0.6405 

0.6101 

0.5792 

0.5481 

0.5169 

0.4859 

6 

0.8916 

0.8750 

0.8569 

0.8374 

0.8164 

0.7941 

0.7704 

0.7455 

0.7195 

0.6925 

7 

0.9619 

0.9542 

0.9455 

0.9357 

0.9247 

0.9124 

0.8988 

0.8838 

0.8675 

0.8499 

8 

0.9895 

0.9869 

0.9837 

0.9800 

0.9757 

0.9706 

0.9647 

0.9580 

0.9503 

0.9417 

9 

0.9978 

0.9971 

0.9963 

0.9952 

0.9940 

0.9924 

0.9905 

0.9883 

0.9856 

0.9825 

10 

0.9997 

0.9995 

0.9994 

0.9992 

0.9989 

0.9986 

0.9981 

0.9976 

0.9969 

0.9961 

11 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9997 

0.9995 

0.9994 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 
x \ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0006 

0.0005 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

1 

0.0066 

0.0054 

0.0044 

0.0036 

0.0029 

0.0023 

0.0019 

0.0015 

0.0012 

0.0009 

2 

0.0339 

0.0287 

0.0242 

0.0203 

0.0170 

0.0142 

0.0117 

0.0097 

0.0079 

0.0065 

3 

0.1095 

0.0961 

0.0839 

0.0730 

0.0632 

0.0545 

0.0468 

0.0399 

0.0339 

0.0287 

4 

0.2541 

0.2303 

0.2078 

0.1868 

0.1672 

0.1490 

0.1322 

0.1167 

0.1026 

0.0898 

5 

0.4550 

0.4246 

0.3948 

0.3656 

0.3373 

0.3100 

0.2837 

0.2585 

0.2346 

0.2120 

6 

0.6645 

0.6357 

0.6063 

0.5764 

0.5461 

0.5157 

0.4852 

0.4549 

0.4249 

0.3953 

7 

0.8308 

0.8104 

0.7887 

0.7656 

0.7414 

0.7160 

0.6895 

0.6620 

0.6337 

0.6047 

8 

0.9320 

0.9211 

0.9090 

0.8957 

0.8811 

0.8652 

0.8480 

0.8293 

0.8094 

0.7880 

9 

0.9788 

0.9745 

0.9696 

0.9639 

0.9574 

0.9500 

0.9417 

0.9323 

0.9218 

0.9102 

10 

0.9951 

0.9939 

0.9924 

0,9907 

0.9886 

0.9861 

0.9832 

0.9798 

0.9759 

0.9713 

11 

0.9992 

0.9990 

0.9987 

0.9983 

0.9978 

0.9973 

0.9966 

0.9958 

0.9947 

0.9935 

12 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9997 

0.9996 

0.9994 

0.9993 

0.9991 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/?----15 


x \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8601 

0.7386 

0.6333 

0.5421 

0.4633 

0.3953 

0.3367 

0.2863 

0.2430 

0.2059 

1 

0.9904 

0.9647 

0.9270 

0.8809 

0.8290 

0.7738 

0.7168 

0.6597 

0.6035 

0.5490 

2 

0.9996 

0.9970 

0.9906 

0.9797 

0.9638 

0.9429 

0.9171 

0.8870 

0.8531 

0.8159 

3 

1.0000 

0.9998 

0.9992 

0.9976 

0.9945 

0.9896 

0.9825 

0.9727 

0.9601 

0.9444 

4 

1.0000 

1.0000 

0.9999 

0.9998 

0.9994 

0.9986 

0.9972 

0.9950 

0.9918 

0.9873 

5 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 

0.9993 

0.9987 

0.9978 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.1741 

0.1470 

0.1238 

0.1041 

0.0874 

0.0731 

0.0611 

0.0510 

0.0424 

0.0352 

1 

0.4969 

0.4476 

0.4013 

0.3583 

0.3186 

0.2821 

0.2489 

0.2187 

0.1915 

0.1671 

2 

0.7762 

0.7346 

0.6916 

0.6480 

0.6042 

0.5608 

0.5181 

0.4766 

0.4365 

0.3980 

3 

0.9258 

0.9041 

0.8796 

0.8524 

0.8227 

0.7908 

0.7571 

0.7218 

0.6854 

0.6482 

4 

0.9813 

0.9735 

0.9639 

0.9522 

0.9383 

0.9222 

0.9039 

0.8833 

0.8606 

0.8358 

5 

0.9963 

0.9943 

0.9916 

0.9879 

0.9832 

0.9773 

0.9700 

0.9613 

0.9510 

0.9389 

6 

0.9994 

0.9990 

0.9985 

0.9976 

0.9964 

0.9948 

0.9926 

0.9898 

0.9863 

0.9819 





TABLE A 

( continued ) 


n = 15 (continued) 


X 

\\ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


7 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9990 

0.9986 

0.9979 

0.9970 

0.9958 


8 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 


0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0291 

0.0241 

0.0198 

0.0163 

0.0134 

0.0109 

0.0089 

0.0072 

0.0059 

0.0047 


1 

0.1453 

0.1259 

0.1087 

0.0935 

0.0802 

0.0685 

0.0583 

0.0495 

0.0419 

0.0353 


2 

0.3615 

0.3269 

0.2945 

0.2642 

0.2361 

0.2101 

0.1863 

0.1645 

0.1447 

0.1268 


3 

0.6105 

0.5726 

0.5350 

0.4978 

0.4613 

0.4258 

0.3914 

0.3584 

0.3268 

0.2969 


4 

0.8090 

0.7805 

0.7505 

0.7190 

0.6865 

0.6531 

0.6190 

0.5846 

0.5500 

0.5155 


5 

0.9252 

0.9095 

0.8921 

0.8728 

0.8516 

0.8287 

0.8042 

0.7780 

0.7505 

0.7216 


6 

0.9766 

0.9702 

0.9626 

0.9537 

0.9434 

0.9316 

0.9183 

0.9035 

0.8870 

0.8689 


7 

0.9942 

0.9922 

0.9896 

0.9865 

0.9827 

0.9781 

0.9726 

0.9662 

0.9587 

0.9500 


8 

0.9989 

0.9984 

0.9977 

0.9969 

0.9958 

0.9944 

0.9927 

0.9906 

0.9879 

0.9848 


9 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9989 

0.9985 

0.9979 

0.9972 

0.9963 


10 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\ P ^ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0038 

0.0031 

0.0025 

0.0020 

0.0016 

0.0012 

0.0010 

0.0008 

0.0006 

0.0005 


1 

0.0296 

0.0248 

0.0206 

0.0171 

0.0142 

0.0117 

0.0096 

0.0078 

0.0064 

0.0052 


2 

0.1107 

0.0962 

0.0833 

0.0719 

0.0617 

0.0528 

0.0450 

0.0382 

0.0322 

0.0271 


3 

0.2686 

0.2420 

0.2171 

0.1940 

0.1727 

0.1531 

0.1351 

0.1187 

0.1039 

0.0905 


4 

0.4813 

0.4477 

0.4148 

0.3829 

0.3519 

0.3222 

0.2938 

0.2668 

0.2413 

0.2173 


5 

0.6916 

0.6607 

0.6291 

0.5968 

0.5643 

0.5316 

0.4989 

0.4665 

0.4346 

0.4032 


6 

0.8491 

0.8278 

0.8049 

0.7806 

0.7548 

0.7278 

0.6997 

0.6705 

0.6405 

0.6098 


7 

0.9401 

0.9289 

0.9163 

0.9023 

0.8868 

0.8698 

0.8513 

0.8313 

0.8098 

0.7869 


8 

0.9810 

0.9764 

0.9711 

0.9649 

0.9578 

0.9496 

0.9403 

0.9298 

0.9180 

0.9050 


9 

0.9952 

0.9938 

0.9921 

0.9901 

0.9876 

0.9846 

0.9810 

0.9768 

0.9719 

0.9662 


10 

0.9991 

0.9988 

0.9984 

0.9978 

0.9972 

0.9963 

0.9953 

0.9941 

0.9925 

0.9907 


11 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9994 

0.9991 

0.9989 

0.9985 

0.9981 


12 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 


13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\ P ^ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


0 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 


1 

0.0042 

0.0034 

0.0027 

0.0021 

0.0017 

0.0013 

0.0010 

0.0008 

0.0006 

0.0005 


2 

0.0227 

0.0189 

0.0157 

0.0130 

0.0107 

0.0087 

0.0071 

0.0057 

0.0046 

0.0037 


3 

0.0785 

0.0678 

0.0583 

0.0498 

0.0424 

0.0359 

0.0303 

0.0254 

0.0212 

0.0176 


4 

0.1948 

0.1739 

0.1546 

0.1367 

0.1204 

0.1055 

0.0920 

0.0799 

0.0690 

0.0592 


5 

0.3726 

0.3430 

0.3144 

0.2869 

0.2608 

0.2359 

0.2125 

0.1905 

0.1699 

0.1509 


6 

0.5786 

0.5470 

0.5153 

0.4836 

0.4522 

0.4211 

0.3905 

0.3606 

0.3316 

0.3036 


7 

0.7626 

0.7370 

0.7102 

0.6824 

0.6535 

0.6238 

0.5935 

0.5626 

0.5314 

0.5000 


8 

0.8905 

0.8746 

0.8573 

0.8385 

0.8182 

0.7966 

0.7735 

0.7490 

0.7233 

0.6964 


9 

0.9596 

0.9521 

0.9435 

0.9339 

0.9231 

0.9110 

0.8976 

0.8829 

0.8667 

0.8491 


10 

0.9884 

0.9857 

0.9826 

0.9789 

0.9745 

0.9695 

0.9637 

0.9570 

0.9494 

0.9408 


11 

0.9975 

0.9968 

0.9960 

0.9949 

0.9937 

0.9921 

0.9903 

0.9881 

0.9855 

0.9824 


12 

0.9996 

0.9995 

0.9993 

0.9991 

0.9989 

0.9986 

0.9982 

0.9977 

0.9971 

0.9963 


13 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 


14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

(continued) 


n -16 


\ V P 

X 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8515 

0.7238 

0.6143 

0.5204 

0.4401 

0.3716 

0.3131 

0.2634 

0.2211 

0.1853 

1 

0.9891 

0.9601 

0.9182 

0.8673 

0.8108 

0.7511 

0.6952 

0.6299 

0.5711 

0.5147 

2 

0.9995 

0.9963 

0.9887 

0.9758 

0.9571 

0.9327 

0.9031 

0.8688 

0.8306 

0.7892 

3 

1.0000 

0.9998 

0.9989 

0.9968 

0.9930 

0.9868 

0.9779 

0.9658 

0.9504 

0.9316 

4 

1.0000 

1.0000 

0.9999 

0.9997 

0.9991 

0.9981 

0.9962 

0.9932 

0.9889 

0.9830 

5 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9935 

0.9990 

0.9981 

0.9967 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1,0000 

0.9999 

0.9999 

0.9997 

0.9995 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0060 

1.0000 

1.0000 

1.0000 

\p 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.1550 

0.1293 

0.1077 

0.0895 

0.0743 

0.0614 

0.0567 

0.0418 

0.0343 

0.0281 

1 

0.4614 

0.4115 

0.3653 

0.3227 

0.2839 

0.2487 

0.2170 

0.1885 

0.1632 

0.1407 

2 

0.7455 

0.7001 

0.6539 

0.6074 

0.5614 

0.5162 

0.4723 

0.4302 

0.3899 

0.3518 

3 

0.9093 

0.8838 

0.8552 

0.8237 

0.7899 

0.7540 

0.7164 

0.6777 

0.6381 

0.5981 

4 

0.9752 

0.9652 

0.9529 

0.9382 

0.9209 

0.9012 

0.8789 

0.8542 

0.8273 

0.7982 

5 

0.9947 

0.9918 

0.9880 

0.9829 

0.9765 

0.9685 

0.9588 

0.9473 

0.9338 

0.9183 

6 

0.9991 

0.9985 

0.9976 

0.9962 

0.9944 

0.9920 

0.9888 

0.9847 

0.9796 

0.9733 

7 

0.9999 

0.9998 

0.9996 

0.9993 

0.9989 

0.9984 

0.9976 

0.9964 

0.9949 

0.9930 

8 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9993 

0.9990 

0.9985 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0060 

1.0000 

1.0000 

1.0000 

x 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0230 

0.0188 

0.0153 

0.0124 

0.0100 

0.0081 

0.0065 

0.0052 

0.0042 

0.0033 

1 

0.1209 

0.1035 

0.0883 

0.0750 

0.0635 

0.0535 

0.0450 

0.0377 

0.0314 

0.0261 

2 

0.3161 

0.2827 

0.2517 

0.2232 

0.1971 

0.1733 

0.1518 

0.1323 

0.1149 

0.0994 

3 

0.5582 

0.5186 

0.4797 

0.4417 

0.4050 

0.3697 

0.3360 

0.3041 

0.2740 

0.2459 

4 

0.7673 

0.7348 

0.7009 

0.6659 

0.6302 

0.5940 

0.5575 

0.5212 

0.4853 

0.4499 

5 

0.9008 

0.8812 

0.8595 

0.8359 

0.8103 

0.7831 

0.7542 

0.7239 

0.6923 

0.6598 

6 

0.9658 

0.9568 

0.9464 

0.9342 

0.9204 

0.9049 

0.8875 

0.8683 

0.8474 

0.8247 

7 

0.9905 

0.9873 

0.9834 

0.9786 

0.9729 

0.9660 

0.9580 

0.9486 

0.9379 

0.9256 

8 

0.9979 

0.9970 

0.9959 

0.9944 

0.9925 

0.9902 

0.9873 

0.9837 

0.9794 

0.9743 

9 

0.9996 

0.9994 

0.9992 

0.9988 

0.9984 

0.9977 

0.9969 

0.9959 

0.9945 

0.9929 

10 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9989 

0.9984 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 
x \ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0026 

0.0021 

0.0016 

0.0013 

0.0010 

0.0008 

0.0006 

0.0005 

0.0004 

0.0003 

1 

0.0216 

0.0178 

0.0146 

0.0120 

0.0098 

0.0079 

0.0064 

0.0052 

0.0041 

0.0033 

2 

0.0856 

0.0734 

0.0626 

0.0533 

0.0451 

0.0380 

0.0319 

0.0266 

0.0222 

0.0183 

3 

0.2196 

0.1953 

0.1730 

0.1525 

0.1339 

0.1170 

0.1018 

0.0881 

0.0759 

0.0651 

4 

0.4154 

0.3819 

0.3496 

0.3187 

0.2892 

0.2613 

0.2351 

0.2105 

0.1877 

0.1666 

5 

0.6264 

0.5926 

0.5584 

0.5241 

0.4900 

0.4562 

0.4230 

0.3906 

0.3592 

0.3288 

6 

0.8003 

0.7743 

0.7469 

0.7181 

0.6881 

0.6572 

0.6254 

0.5930 

0.5602 

0.5272 

7 

0.9119 

0.8965 

0.8795 

0.8609 

0.8406 

0.8187 

0.7952 

0.7702 

0.7438 

0.7161 

8 

0.9683 

0.9612 

0.9530 

0.9436 

0.9329 

0.9209 

0.9074 

0.8924 

0.8758 

0.8577 

9 

0.9908 

0.9883 

0.9852 

0.9815 

0.9771 

0.9720 

0.9659 

0.9589 

0.9509 

0.9417 

10 

0.9979 

0.9972 

0.9963 

0.9952 

0.9938 

0.9921 

0.9900 

0.9875 

0.9845 

0.9809 

11 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

0.9983 

0.9977 

0.9970 

0.9962 

0.9951 

12 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9991 




TABLE A 

( continued ) 


/7 — 16 (continued) 


\p 

X \ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0026 

0.0021 

0.0016 

0.0013 

0.0010 

0.0008 

0.0006 

0.0005 

0.0003 

0.0003 

2 

0.0151 

0.0124 

0.0101 

0.0082 

0.0066 

0.0053 

0.0042 

0.0034 

0.0027 

0.0021 

3 

0.0556 

0.0473 

0.0400 

0.0336 

0.0281 

0.0234 

0.0194 

0.0160 

0.0131 

0.0106 

4 

0.1471 

0.1293 

0.1131 

0.0985 

0.0853 

0.0735 

0.0630 

0.0537 

0.0456 

0.0384 

5 

0.2997 

0.2720 

0.2457 

0.2208 

0.1976 

0.1759 

0.1559 

0.1374 

0.1205 

0.1051 

6 

0.4942 

0.4613 

0.4289 

0.3971 

0.3660 

0.3359 

0.3068 

0.2790 

0.2524 

0.2272 

7 

0.6872 

0.6572 

0.6264 

0.5949 

0.5629 

0.5306 

0.4981 

0.4657 

0.4335 

0.4018 

8 

0.8381 

0.8168 

0.7940 

0.7698 

0.7441 

0.7171 

0.6889 

0.6596 

0.6293 

0.5982 

9 

0.9313 

0.9195 

0.9064 

0.8919 

0.8759 

0.8584 

0.8393 

0.8186 

0.7964 

0.7728 

10 

0.9766 

0.9716 

0.9658 

0.9591 

0.9514 

0.9426 

0.9326 

0.9214 

0.9089 

0.8949 

11 

0.9938 

0.9922 

0.9902 

0.9879 

0.9851 

0.9817 

0.9778 

0.9732 

0.9678 

0.9616 

12 

0.9988 

0.9984 

0.9979 

0.9973 

0.9965 

0.9956 

0.9945 

0.9931 

0.9914 

0.9894 

13 

0.9998 

0.9998 

0.9997 

0.9996 

0.9994 

0.9993 

0.9990 

0.9987 

0.9984 

0.9979 

14 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n = 17 


P 

x 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8429 

0.7093 

0.5958 

0.4996 

0.4181 

0.3493 

0.2912 

0.2423 

0.2012 

0.1668 

1 

0.9877 

0.9554 

0.9091 

0.8535 

0.7922 

0.7283 

0.6638 

0.6005 

0.5396 

0.4818 

2 

0.9994 

0.9956 

0.9866 

0.9714 

0.9497 

0.9218 

0.8882 

0.8497 

0.8073 

0.7618 

3 

1.0000 

0.9997 

0.9986 

0.9960 

0.9912 

0.9836 

0.9727 

0.9581 

0.9397 

0.9174 

4 

1.0000 

1.0000 

0.9999 

0.9996 

0.9988 

0.9974 

0.9949 

0.9911 

0.9855 

0.9779 

5 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9993 

0.9985 

0.9973 

0.9953 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9992 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

^\P 
x \ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.1379 

0.1138 

0.0937 

0.0770 

0.0631 

0.0516 

0.0421 

0.0343 

0.0278 

0.0225 

1 

0.4277 

0.3777 

0.3318 

0.2901 

0.2525 

0.2187 

0.1887 

0.1621 

0.1387 

0.1182 

2 

0,7142 

0.6655 

0.6164 

0.5676 

0.5198 

0.4734 

0.4289 

0.3867 

0.3468 

0.3096 

3 

0.8913 

0.8617 

0.8290 

0.7935 

0.7556 

0.7159 

0.6749 

0.6331 

0.5909 

0.5489 

4 

0.9679 

0.9554 

0.9402 

0.9222 

0.9013 

0.8776 

0.8513 

0.8225 

0.7913 

0.7582 

5 

0.9925 

0.9886 

0.9834 

0.9766 

0.9681 

0.9577 

0.9452 

0.9305 

0.9136 

0.8943 

6 

0.9986 

0.9977 

0.9963 

0.9944 

0.9917 

0.9882 

0.9837 

0.9780 

0.9709 

0.9623 

7 

0.9998 

0.9996 

0.9993 

0.9989 

0.9983 

0.9973 

0.9961 

0.9943 

0.9920 

0.9891 

8 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9988 

0.9982 

0.9974 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

(continued) 


/7 —17 ( continued ) 


X 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0182 

0.0146 

0.0118 

0.0094 

0.0075 

0.0060 

0.0047 

0.0038 

0.0030 

0.0023 

1 

0.1004 

0.0849 

0.0715 

0.0600 

0.0501 

0.0417 

0.0346 

0.0286 

0.0235 

0.0193 

2 

0.2751 

0.2433 

0.2141 

0.1877 

0.1637 

0.1422 

0.1229 

0.1058 

0.0907 

0.0774 

3 

0.5073 

0.4667 

0.4272 

0.3893 

0.3530 

0.3186 

0.2863 

0.2560 

0.2279 

0.2019 

4 

0.7234 

0.6872 

0.6500 

0.6121 

0.5739 

0.5357 

0.4977 

0.4604 

0.4240 

0.3887 

5 

0.8727 

0.8490 

0.8230 

0.7951 

0.7653 

0.7339 

0.7011 

0.6671 

0.6323 

0.5968 

6 

0.9521 

0.9402 

0.9264 

0.9106 

0.8929 

0.8732 

0.8515 

0.8279 

0.8024 

0.7752 

7 

0.9853 

0.9806 

0.9749 

0.9680 

0.9598 

0.9501 

0.9389 

0.9261 

0.9116 

0.8954 

8 

0.9963 

0.9949 

0.9930 

0.9906 

0.9876 

0.9839 

0.9794 

0.9739 

0.9674 

0.9597 

9 

0.9993 

0.9989 

0.9984 

0.9978 

0.9969 

0.9958 

0.9943 

0.9925 

0.9902 

0.9873 

10 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9987 

0.9982 

0.9976 

0.9968 

11 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

| 0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0018 

0.0014 

0.0011 

0.0009 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

0.0002 

1 

0.0157 

0.0128 

0.0104 

0.0083 

0.0067 

0.0054 

0.0043 

0.0034 

0.0027 

0.0021 

2 

0.0657 

0.0556 

0.0468 

0.0392 

0.0327 

0.0272 

0.0225 

0.0185 

0.0151 

0.0123 

3 

0.1781 

0.1563 

0.1366 

0.1188 

0.1028 

0.0885 

0.0759 

0.0648 

0.0550 

0.0464 

4 

0.3547 

0.3222 

0.2913 

0.2622 

0.2348 

0.2094 

0.1858 

0.1640 

0.1441 

0.1260 

5 

0.5610 

0.5251 

0.4895 

0.4542 

0.4197 

0.3861 

0.3535 

0.3222 

0.2923 

0.2639 

6 

0.7464 

0.7162 

0.6847 

0.6521 

0.6188 

0.5848 

0.5505 

0.5161 

0.4818 

0.4478 

7 

0.8773 

0.8574 

0.8358 

0.8123 

0.7872 

0.7605 

0.7324 

0.7029 

0.6722 

0.6405 

8 

0.9508 

0.9405 

0.9288 

0.9155 

0.9006 

0.8841 

0.8659 

0.8459 

0.8243 

0.8011 

9 

0.9838 

0.9796 

0.9746 

0.9686 

0.9617 

0.9536 

0.9443 

0.9336 

0.9216 

0.9081 

10 

0.9957 

0.9943 

0.9926 

0.9905 

0.9880 

0.9849 

0.9811 

0.9766 

0.9714 

0.9652 

11 

0.9991 

0.9987 

0.9983 

0.9977 

0.9970 

0.9960 

0.9949 

0.9934 

0.9916 

0.9894 

12 

0.9998 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9989 

0.9985 

0.9981 

0.9975 

13 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0016 

0.0013 

0.0010 

0.0008 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

2 

0.0100 

0.0080 

0.0065 

0.0052 

0.0041 

0.0032 

0.0025 

0.0020 

0.0015 

0.0012 

3 

0.0390 

0.0326 

0.0271 

0.0224 

0.0184 

0.0151 

0.0123 

0.0099 

0.0080 

0.0064 

4 

0.1096 

0.0949 

0.0817 

0.0699 

0.0596 

0.0505 

0.0425 

0.0356 

0.0296 

0.0245 

5 

0.2372 

0.2121 

0.1887 

0.1670 

0.1471 

0.1288 

0.1122 

0.0972 

0.0838 

0.0717 

6 

0.4144 

0.3818 

0.3501 

0.3195 

0.2902 

0.2623 

0.2359 

0.2110 

0.1878 

0.1662 

7 

0.6080 

0.5750 

0.5415 

0.5079 

0.4743 

0.4410 

0.4082 

0.3761 

0.3448 

0.3145 

8 

0.7762 

0.7498 

0.7220 

0.6928 

0.6626 

0.6313 

0.5992 

0.5665 

0.5333 

0.5000 

9 

0.8930 

0.8764 

0.8581 

0.8382 

0.8166 

0.7934 

0.7686 

0.7423 

0.7145 

0.6855 

10 

0.9580 

0.9497 

0.9403 

0.9295 

0.9174 

0.9038 

0.8888 

0.8721 

0.8538 

0.8338 

11 

0.9867 

0.9835 

0.9797 

0.9752 

0.9699 

0.9637 

0.9566 

0.9483 

0.9389 

0.9283 

12 

0.9967 

0.9958 

0.9946 

0.9931 

0.9914 

0.9892 

0.9866 

0.9835 

0.9798 

0.9755 

13 

0.9994 

0.9992 

0.9989 

0.9986 

0.9981 

0.9976 

0.9969 

0.9960 

0.9950 

0.9936 

14 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9991 

0.9988 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



X 

\p^ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.8345 

0.6951 

0.5780 

0.4796 

0.3972 

0.3283 

0.2708 

0.2229 

0.1831 

0.1501 


1 

0.9862 

0.9505 

0.8997 

0.8393 

0.7735 

0.7055 

0.6378 

0.5719 

0.5091 

0.4503 


2 

0.9993 

0.9948 

0.9843 

0.9667 

0.9419 

0.9102 

0.8725 

0.8298 

0.7832 

0.7338 


3 

1.0000 

0.9996 

0.9982 

0.9950 

0.9891 

0.9799 

0.9667 

0.9494 

0.9277 

0.9018 


4 

1.0000 

1.0000 

0.9998 

0.9994 

0.9985 

0.9966 

0.9933 

0.9884 

0.9814 

0.9718 


5 

1.0000 

1.0000 

1.0000 

0.9999 

0,9998 

0.9995 

0.9990 

0.9979 

0.9962 

0.9936 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9994 

0.9988 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

'X 

X 

\ P ^ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.1227 

0.1002 

0.0815 

0.0662 

0.0536 

0.0434 

0.0349 

0.0281 

0.0225 

0.0180 


1 

0.3958 

0.3460 

0.3008 

0.2602 

0.2241 

0.1920 

0.1638 

0.1391 

0.1176 

0.0991 


2 

0.6827 

0.6310 

0.5794 

0.5287 

0.4797 

0.4327 

0.3881 

0.3462 

0.3073 

0.2713 


3 

0.8718 

0.8382 

0.8014 

0.7618 

0.7202 

0.6771 

0.6331 

0.5888 

0.5446 

0.5010 


4 

0.9595 

0.9442 

0.9257 

0.9041 

0.8794 

0.8518 

0.8213 

0.7884 

0.7533 

0.7164 


5 

0.9898 

0.9846 

0.9778 

0.9690 

0.9581 

0.9449 

0.9292 

0.9111 

0.8903 

0.8671 


6 

0.9979 

0.9966 

0.9946 

0.9919 

0.9882 

0.9833 

0.9771 

0.9694 

0.9600 

0.9487 


7 

0.9997 

0.9994 

0.9989 

0.9983 

0.9973 

0.9959 

0.9940 

0.9914 

0.9880 

0.9837 


8 

1.0000 

0,9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9987 

0.9980 

0.9971 

0.9957 


9 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9991 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 


0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0144 

0.0114 

0.0091 

0.0072 

0.0056 

0.0044 

0.0035 

0.0027 

0.0021 

0.0016 


1 

0.0831 

0.0694 

0.0577 

0.0478 

0.0395 

0.0324 

0.0265 

0.0216 

0.0176 

0.0142 


2 

0.2384 

0.2084 

0.1813 

0.1570 

0.1353 

0.1161 

0.0991 

0.0842 

0.0712 

0.0600 


3 

0.4586 

0.4175 

0.3782 

0.3409 

0.3057 

0.2728 

0.2422 

0.2140 

0.1881 

0.1646 


4 

0.6780 

0.6387 

0.5988 

0.5586 

0.5187 

0.4792 

0.4406 

0.4032 

0.3671 

0.3327 


5 

0.8414 

0.8134 

0.7832 

0.7512 

0.7174 

0.6824 

0.6462 

0.6093 

0.5719 

0.5344 


6 

0.9355 

0.9201 

0.9026 

0.8829 

0.8610 

0.8370 

0.8109 

0.7829 

0.7531 

0.7217 


7 

0.9783 

0.9717 

0.9637 

0.9542 

0.9431 

0.9301 

0.9153 

0.8986 

0.8800 

0.8593 


8 

0.9940 

0.9917 

0.9888 

0.9852 

0.9807 

0.9751 

0.9684 

0.9605 

0.9512 

0.9404 


9 

0.9986 

0.9980 

0.9972 

0.9961 

0.9946 

0.9927 

0.9903 

0.9873 

0.9836 

0.9790 


10 

0.9997 

0.9996 

0.9994 

0.9991 

0.9988 

0.9982 

0.9975 

0.9966 

0.9954 

0.9939 


11 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9990 

0.9986 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 


13 

1.0000 

1.0000 

1.0000 

1,0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

\\ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0013 

0.0010 

0.0007 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 


1 

0.0114 

0.0092 

0.0073 

0.0058 

0.0046 

0.0036 

0.0028 

0.0022 

0.0017 

0.0013 


2 

0.0502 

0.0419 

0.0348 

0.0287 

0.0236 

0.0193 

0.0157 

0.0127 

0.0103 

0.0082 


3 

0.1432 

0.1241 

0.1069 

0.0917 

0.0783 

0.0665 

0.0561 

0.0472 

0.0394 

0.0328 


4 

0.2999 

0.2691 

0.2402 

0.2134 

0.1886 

0.1659 

0.1451 

0.1263 

0.1093 

0.0942 


5 

0.4971 

0.4602 

0.4241 

0.3889 

0.3550 

0.3224 

0.2914 

0.2621 

0.2345 

0.2088 


6 

0.6889 

0.6550 

0.6202 

0.5849 

0.5491 

0.5133 

0.4776 

0.4424 

0.4079 

0.3743 


7 

0.8367 

0.8122 

0.7859 

0.7579 

0.7283 

0.6973 

0.6651 

0.6319 

0.5979 

0.5634 


8 

0.9280 

0.9139 

0.8981 

0.8804 

0.8609 

0.8396 

0.8165 

0.7916 

0.7650 

0.7368 


9 

0.9736 

0.9671 

0.9595 

0.9506 

0.9403 

0.9286 

0.9153 

0.9003 

0.8837 

0.8653 



TABLE A 

( continued ) 


n = 18 ( continued ) 


x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

10 

0.9920 

0.9896 

0.9867 

0.9831 

0.9788 

0.9736 

0.9675 

0.9603 

0.9520 

0.9424 

11 

0.9980 

0.9973 

0.9964 

0.9953 

0.9938 

0.9920 

0.9898 

0.9870 

0.9837 

0.9797 

12 

0.9996 

0.9995 

0.9992 

0.9989 

0.9986 

0.9981 

0.9974 

0.9966 

0.9956 

0.9942 

13 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0010 

0.0008 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

2 

0.0066 

0.0052 

0.0041 

0.0032 

0.0025 

0.0019 

0.0015 

0.0011 

0.0009 

0.0007 

3 

0.0271 

0.0223 

0.0182 

0.0148 

0.0120 

0.0096 

0.0077 

0.0061 

0.0048 

0.0038 

4 

0.0807 

0.0687 

0.0582 

0.0490 

0.0411 

0.0342 

0.0283 

0.0233 

0.0190 

0.0154 

5 

0.1849 

0.1628 

0.1427 

0.1243 

0.1077 

0.0928 

0.0795 

0.0676 

0.0572 

0.0481 

6 

0.3418 

0.3105 

0.2807 

0.2524 

0.2258 

0.2009 

0.1778 

0.1564 

0.1368 

0.1189 

7 

0.5287 

0.4938 

0.4592 

0.4250 

0.3915 

0.3588 

0.3272 

0.2968 

0.2678 

0.2403 

8 

0.7072 

0.6764 

0.6444 

0.6115 

0.5778 

0.5438 

0.5094 

0.4751 

0.4409 

0.4073 

9 

0.8451 

0.8232 

0.7996 

0.7742 

0.7473 

0.7188 

0.6890 

0.6579 

0.6258 

0.5927 

10 

0.9314 

0.9189 

0.9049 

0.8893 

0.8720 

0.8530 

0.8323 

0.8098 

0.7856 

0.7597 

11 

0.9750 

0.9693 

0.9628 

0.9551 

0.9463 

0.9362 

0.9247 

0.9117 

0.8972 

0.8811 

12 

0.9926 

0.9906 

0.9882 

0.9853 

0.9817 

0.9775 

0.9725 

0.9666 

0.9598 

0.9519 

13 

0.9983 

0.9978 

0.9971 

0.9962 

0.9951 

0.9937 

0.9921 

0.9900 

0.9875 

0.9846 

14 

0.9997 

0.9996 

0.9994 

0.9993 

0.9990 

0.9987 

0.9983 

0.9977 

0.9971 

0.9962 

15 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n =19 


\ J 
x 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8262 

0.6812 

0.5606 

0.4604 

0.3774 

0.3086 

0.2519 

0.2051 

0.1666 

0.1351 

1 

0.9847 

0.9454 

0.8900 

0.8249 

0.7547 

0.6829 

0.6121 

0.5440 

0.4798 

0.4203 

2 

0.9991 

0.9939 

0.9817 

0.9616 

0.9335 

0.8979 

0.8561 

0.8092 

0.7585 

0.7054 

3 

1.0000 

0.9995 

0.9978 

0.9939 

0.9868 

0.9757 

0.9602 

0.9398 

0.9147 

0.8850 

4 

1.0000 

1.0000 

0.9998 

0.9993 

0.9980 

0.9956 

0.9915 

0.9853 

0.9765 

0.9648 

5 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9994 

0.9986 

0.9971 

0.9949 

0.9914 

6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9991 

0.9983 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.1092 

0.0881 

0.0709 

0.0569 

0.0456 

0.0364 

0.0290 

0.0230 

0.0182 

0.0144 

1 

0.3658 

0.3165 

0.2723 

0.2331 

0.1985 

0.1682 

0.1419 

0.1191 

0.0996 

0.0829 

2 

0.6512 

0.5968 

0.5432 

0.4911 

0.4413 

0.3941 

0.3500 

0.3090 

0.2713 

0.2369 

3 

0.8510 

0.8133 

0.7725 

0.7292 

0.6841 

0.6380 

0.5915 

0.5451 

0.4995 

0.4551 

4 

0.9498 

0.9315 

0.9096 

0,8842 

0.8556 

0.8238 

0.7893 

0.7524 

0.7136 

0.6733 

5 

0.9865 

0.9798 

0.9710 

0.9599 

0,9463 

0.9300 

0.9109 

0.8890 

0.8643 

0.8369 

6 

0.9970 

0.9952 

0.9924 

0.9887 

0.9837 

0.9772 

0.9690 

0.9589 

0.9468 

0.9324 

7 

0.9995 

0.9991 

0.9984 

0.9974 

0.9959 

0.9939 

0.9911 

0.9874 

0.9827 

0.9767 

8 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9986 

0.9979 

0.9968 

0.9953 

0.9933 

9 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9993 

0.9990 

0.9984 



TABLE A 

(i continued ) 


n = 19 ( continued ) 


\p 

X 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

X 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0113 

0.0089 

0.0070 

0.0054 

0.0042 

0.0033 

0.0025 

0.0019 

0.0015 

0.0011 

1 

0.0687 

0.0566 

0.0465 

0.0381 

0.0310 

0.0251 

0.0203 

0.0163 

0.0131 

0.0104 

2 

0.2058 

0.1778 

0.1529 

0.1308 

0.1113 

0.0943 

0.0795 

0.0667 

0.0557 

0.0462 

3 

0.4123 

0.3715 

0.3329 

0.2968 

0.2631 

0.2320 

0.2035 

0.1776 

0.1542 

0.1332 

4 

0.6319 

0.5900 

0.5480 

0.5064 

0.4654 

0.4256 

0.3871 

0.3502 

0.3152 

0.2822 

5 

0.8071 

0.7749 

0.7408 

0.7050 

0.6677 

0.6295 

0.5907 

0.5516 

0.5125 

0.4739 

6 

0.9157 

0.8966 

0.8751 

0.8513 

0.8251 

0.7968 

0.7664 

0.7343 

0.7005 

0.6655 

7 

0.9693 

0.9604 

0.9497 

0.9371 

0.9225 

0.9059 

0.8871 

0.8662 

0.8432 

0.8180 

8 

0.9907 

0.9873 

0.9831 

0.9778 

0.9713 

0.9634 

0.9541 

0.9432 

0.9306 

0.9161 

9 

0.9977 

0.9966 

0.9953 

0.9934 

0.9911 

0.9881 

0.9844 

0.9798 

0.9742 

0.9674 

10 

0.9995 

0.9993 

0.9989 

0.9984 

0.9977 

0.9968 

0.9956 

0.9940 

0.9920 

0.9895 

11 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9993 

0.9990 

0.9985 

0.9980 

0.9972 

12 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0009 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

1 

0.0083 

0.0065 

0.0051 

0.0040 

0.0031 

0.0024 

0.0019 

0.0014 

0.0011 

0.0008 

2 

0.0382 

0.0314 

0.0257 

0.0209 

0.0170 

0.0137 

0.0110 

0.0087 

0.0069 

0.0055 

3 

0.1144 

0.0978 

0.0831 

0.0703 

0.0591 

0.0495 

0.0412 

0.0341 

0.0281 

0.0230 

4 

0.2514 

0.2227 

0.1963 

0.1720 

0.1500 

0.1301 

0.1122 

0.0962 

0.0821 

0.0696 

5 

0.4359 

0.3990 

0.3634 

0.3293 

0.2968 

0.2661 

0.2373 

0.2105 

0.1857 

0.1629 

6 

0.6294 

0.5927 

0.5555 

0.5182 

0.4812 

0.4446 

0.4087 

0.3739 

0.3403 

0.3081 

7 

0.7909 

0.7619 

0.7312 

0.6990 

0.6656 

0.6310 

0.5957 

0.5599 

0.5238 

0.4878 

8 

0.8997 

0.8814 

0.8611 

0.8388 

0.8145 

0.7884 

0.7605 

0.7309 

0.6998 

0.6675 

9 

0.9595 

0.9501 

0.9392 

0.9267 

0.9125 

0.8965 

0.8787 

0.8590 

0.8374 

0.8139 

10 

0.9863 

0.9824 

0.9777 

0.9720 

0.9653 

0.9574 

0.9482 

0.9375 

0.9253 

0.9115 

11 

0.9962 

0.9949 

0.9932 

0.9911 

0.9886 

0.9854 

0.9815 

0.9769 

0.9713 

0.9648 

12 

0.9991 

0.9988 

0.9983 

0.9977 

0.9969 

0.9959 

0.9946 

0.9930 

0.9909 

0.9884 

13 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9991 

0.9987 

0.9983 

0.9977 

0.9969 

14 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9994 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\p 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0006 

0.0005 

0.0004 

0.0003 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

2 

0.0043 

0.0033 

0.0026 

0.0020 

0.0015 

0.0012 

0.0009 

0.0007 

0.0005 

0.0004 

3 

0.0187 

0.0151 

0.0122 

0.0097 

0.0077 

0.0061 

0.0048 

0.0037 

0.0029 

0.0022 

4 

0.0587 

0.0492 

0.0410 

0.0340 

0.0280 

0.0229 

0.0186 

0.0150 

0.0121 

0.0096 

5 

0.1421 

0.1233 

0.1063 

0.0912 

0.0777 

0.0658 

0.0554 

0.0463 

0.0385 

0.0318 

6 

0.2774 

0.2485 

0.2213 

0.1961 

0.1727 

0.1512 

0.1316 

0.1138 

0.0978 

0.0835 

7 

0.4520 

0.4168 

0.3824 

0.3491 

0.3169 

0.2862 

0.2570 

0.2294 

0.2036 

0.1796 

8 

0.6340 

0.5997 

0.5647 

0.5294 

0.4940 

0.4587 

0.4238 

0.3895 

0.3561 

0.3238 

9 

0.7886 

0.7615 

0.7328 

0.7026 

0.6710 

0.6383 

0.6046 

0.5701 

0.5352 

0.5000 


TABLE A 

{continued) 


n = '\9 ( continued ) 


x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


10 

0.8960 

0.8787 

0.8596 

0.8387 

0.8159 

0.7913 

0.7649 

0.7369 

0.7073 

*06762 


11 

0.9571 

0.9482 

0.9379 

0.9262 

0.9129 

0.8979 

0.8813 

0.8628 

0.8425 

0.8204 


12 

0.9854 

0.9817 

0.9773 

0.9720 

0.9658 

0.9585 

0.9500 

0.9403 

0.9291 

0.9165 


13 

0.9960 

0.9948 

0.9933 

0.9914 

0.9891 

0.9863 

0.9829 

0.9788 

0.9739 

0.9682 


14 

0.9991 

0.9988 

0.9984 

0.9979 

0.9972 

0.9964 

0.9954 

0.9940 

0.9924 

0.9904 


15 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

0.9983 

0.9978 


16 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 


17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 






/? 

-20 







\ P 
x \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.8179 

0.6676 

0.5438 

0.4420 

0.3585 

0.2901 

0.2342 

0.1887 

0.1516 

0.1216 


1 

0.9831 

0.9401 

0.8802 

0.8103 

0.7358 

0.6605 

0.5869 

0.5169 

0.4516 

0.3917 


2 

0.9990 

0.9929 

0.9790 

0.9561 

0.9245 

0.8850 

0.8390 

0.7879 

0.7334 

0.6769 


3 

1.0000 

0.9994 

0.9973 

0.9926 

0.9841 

0.9710 

0.9529 

0.9294 

0.9007 

0.8670 


4 

1.0000 

1.0000 

0.9997 

0.9990 

0.9974 

0.9944 

0.9893 

0.9817 

0.9710 

0.9568 


5 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9991 

0.9981 

0.9962 

0.9932 

0.9887 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9994 

0.9987 

0.9976 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


x \ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.0972 

0.0776 

0.0617 

0.0490 

0.0388 

0.0306 

0.0241 

0.0189 

0.0148 

0.0115 


1 

0.3376 

0.2891 

0.2461 

0.2084 

0.1756 

0.1471 

0.1227 

0.1018 

0.0841 

0.0692 


2 

0.6198 

0.5631 

0.5080 

0.4550 

0.4049 

0.3580 

0.3146 

0.2748 

0.2386 

0.2061 


3 

0.8290 

0.7873 

0.7427 

0.6959 

0.6477 

0.5990 

0.5504 

0.5026 

0.4561 

0.4114 


4 

0.9390 

0.9173 

0.8917 

0.8625 

0.8298 

0.7941 

0.7557 

0.7151 

0.6729 

0.6296 


5 

0.9825 

0.9740 

0.9630 

0.9493 

0.9327 

0.9130 

0.8902 

0.8644 

0.8357 

0.8042 


6 

0.9959 

0.9933 

0.9897 

0.9847 

0.9781 

0.9696 

0.9591 

0.9463 

0.9311 

0.9133 


7 

0.9992 

0.9986 

0.9976 

0.9962 

0.9941 

0.9912 

0.9873 

0.9823 

0.9759 

0.9679 


8 

0.9999 

0.9998 

0.9995 

0.9992 

0.9987 

0.9979 

0.9967 

0.9951 

0.9929 

0.9900 


9 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9993 

0.9989 

0.9983 

0.9974 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


\p 
x \ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0090 

0.0069 

0.0054 

0.0041 

0.0032 

0.0024 

0.0018 

0.0014 

0.0011 

0.0008 


1 

0.0566 

0.0461 

0.0374 

0.0302 

0.0243 

0.0195 

0.0155 

0.0123 

0.0097 

0.0076 


2 

0.1770 

0.1512 

0.1284 

0.1085 

0.0913 

0.0763 

0.0635 

0.0526 

0.0433 

0.0355 


3 

0.3690 

0.3289 

0.2915 

0.2569 

0.2252 

0.1962 

0.1700 

0.1466 

0.1256 

0.1071 


4 

0.5858 

0.5420 

0.4986 

0.4561 

0.4148 

0.3752 

0.3375 

0.3019 

0.2685 

0.2375 


5 

0.7703 

0.7343 

0.6965 

0.6573 

0.6172 

0.5765 

0.5357 

0.4952 

0.4553 

0.4164 


6 

0.8929 

0.8699 

0.8442 

0.8162 

0.7858 

0.7533 

0.7190 

0.6831 

0.6460 

0.6080 


7 

0.9581 

0.9464 

0.9325 

0.9165 

0.8982 

0.8775 

0.8545 

0.8293 

0.8018 

0.7723 


8 

0.9862 

0.9814 

0.9754 

0.9680 

0.9591 

0.9485 

0.9360 

0.9216 

0.9052 

0.8867 


9 

0.9962 

0.9946 

0.9925 

0.9897 

0.9861 

0.9817 

0.9762 

0.9695 

0.9615 

0.9520 


10 

0.9991 

0.9987 

0.9981 

0.9972 

0.9961 

0.9945 

0.9926 

0.9900 

0.9868 

0.9829 





TABLE A 

(continued) 


/? = 20 ( continued ) 


X \ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

11 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9986 

0.9981 

0.9973 

0.9962 

0.9949 

12 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9987 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

1 

0.0060 

0.0047 

0.0036 

0.0028 

0.0021 

0.0016 

0.0012 

0.0009 

0.0007 

0.0005 

2 

0.0289 

0.0235 

0.0189 

0.0152 

0.0121 

0.0096 

0.0076 

0.0060 

0.0047 

0.0036 

3 

0.0908 

0.0765 

0.0642 

0.0535 

0.0444 

0.0366 

0.0300 

0.0245 

0.0198 

0.0160 

4 

0.2089 

0.1827 

0.1589 

0.1374 

0.1182 

0.1011 

0.0859 

0.0726 

0.0610 

0.0510 

5 

0.3787 

0.3426 

0.3082 

0.2758 

0.2454 

0.2171 

0.1910 

0.1671 

0.1453 

0.1256 

6 

0.5695 

0.5307 

0.4921 

0.4540 

0.4166 

0.3803 

0.3453 

0.3118 

0.2800 

0.2500 

7 

0.7409 

0.7078 

0.6732 

0.6376 

0.6010 

0.5639 

0.5265 

0.4892 

0.4522 

0.4159 

8 

0.8660 

0.8432 

0.8182 

0.7913 

0.7624 

0.7317 

0.6995 

0.6659 

0.6312 

0.5956 

9 

0.9409 

0.9281 

0.9134 

0.8968 

0.8782 

0.8576 

0.8350 

0.8103 

0.7837 

0.7553 

10 

0.9780 

0.9721 

0.9650 

0.9566 

0.9468 

0.9355 

0.9225 

0.9077 

0.8910 

0.8725 

11 

0.9931 

0.9909 

0.9881 

0.9846 

0.9804 

0.9753 

0.9692 

0.9619 

0.9534 

0.9435 

12 

0.9982 

0.9975 

0.9966 

0.9955 

0.9940 

0.9921 

0.9898 

0.9868 

0.9833 

0.9790 

13 

0.9996 

0.9994 

0.9992 

0.9989 

0.9985 

0.9979 

0.9972 

0.9963 

0.9951 

0.9935 

14 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9991 

0.9988 

0.9984 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

2 

0.0028 

0.0021 

0.0016 

0.0012 

0.0009 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

3 

0.0128 

0.0102 

0.0080 

0.0063 

0.0049 

0.0038 

0.0029 

0.0023 

0.0017 

0.0013 

4 

0.0423 

0.0349 

0.0286 

0.0233 

0.0189 

0.0152 

0.0121 

0.0096 

0.0076 

0.0059 

5 

0.1079 

0.0922 

0.0783 

0.0660 

0.0553 

0.0461 

0.0381 

0.0313 

0.0255 

0.0207 

6 

0.2220 

0.1959 

0.1719 

0.1499 

0.1299 

0.1119 

0.0958 

0.0814 

0.0688 

0.0577 

7 

0.3804 

0.3461 

0.3132 

0.2817 

0.2520 

0.2241 

0.1980 

0.1739 

0.1518 

0.1316 

8 

0.5594 

0.5229 

0.4864 

0.4501 

0.4143 

0.3793 

0.3454 

0.3127 

0.2814 

0.2517 

9 

0.7252 

0.6936 

0.6606 

0.6264 

0.5914 

0.5557 

0.5196 

0.4834 

0.4474 

0.4119 

10 

0.8520 

0.8295 

0.8051 

0.7788 

0.7507 

0.7209 

0.6896 

0.6568 

0.6229 

0.5881 

11 

0.9321 

0.9190 

0.9042 

0.8877 

0.8692 

0.8489 

0.8266 

0.8024 

0.7762 

0.7483 

12 

0.9738 

0.9676 

0.9603 

0.9518 

0.9420 

0.9306 

0.9177 

0.9031 

0.8867 

0.8684 

13 

0.9916 

0.9893 

0.9864 

0.9828 

0.9786 

0.9735 

0.9674 

0.9603 

0.9520 

0.9423 

14 

0.9978 

0.9971 

0.9962 

0.9950 

0.9936 

0.9917 

0.9895 

0.9867 

0.9834 

0.9793' 

15 

0.9996 

0.9994 

0.9992 

0.9989 

0.9985 

0.9980 

0.9973 

0.9965 

0.9954 

0.9941 

16 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9999 

0.9998 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

ItSSSP - 


n = 21 


^\P 
x \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8097 

0.6543 

0.5275 

0.4243 

0.3406 

0.2727 

0.2178 

0.1736 

0.1380 

0.1094 

1 | 

0.9815 

0.9347 

0.8701 

0.7956 

0.7170 

0.6382 

0.5622 

0.4906 

0.4246 

0.3647 


TABLE A 

(continued) 


n-=- 21 (continued) 


X 


0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


2 

0.9988 

0.9919 

0.9760 

0.9503 

0,9151 

0.8716 

0.8213 

0.7663 

0.7081 

0.6484 


3 

0.9999 

0.9993 

0.9968 

0.9911 

0.9811 

0.9659 

0.9449 

0.9181 

0.8856 

0.8480 


4 

1.0000 

1.0000 

0.9997 

0.9988 

0,9968 

0.9930 

0.9867 

0.9775 

0.9646 

0.9478 


5 

1.0000 

1.0000 

1.0000 

0.9999 

0.9996 

0.9988 

0.9975 

0.9950 

0.9912 

0.9856 


6 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9998 

0.9996 

0.9991 

0.9982 

0.9967 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 

0.9994 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

"X 

X 


0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.0865 

0.0683 

0.0537 

0.0421 

0.0329 

0.0257 

0.0200 

0.0155 

0.0120 

0.0092 


1 

0.3111 

0.2637 

0.2222 

0.1861 

0.1550 

0.1285 

0.1059 

0.0869 

0.0709 

0.0576 


2 

0.5887 

0.5302 

0.4739 

0.4205 

0.3705 

0.3243 

0.2820 

0.2437 

0.2093 

0.1787 


3 

0.8060 

0.7604 

0.7122 

0.6622 

0.6113 

0.5604 

0.5103 

0.4616 

0.4148 

0.3704 


4 

0.9269 

0.9017 

0.8724 

0.8392 

0.8025 

0.7629 

0.7208 

0.6769 

0.6317 

0.5860 


5 

0.9777 

0.9672 

0.9538 

0.9372 

0.9173 

0.8940 

0.8674 

0.8375 

0.8047 

0.7693 


6 

0.9944 

0.9910 

0.9862 

0.9797 

0.9713 

0.9606 

0.9474 

0.9316 

0.9130 

0.8915 


7 

0.9988 

0.9980 

0.9966 

0.9945 

0.9917 

0.9877 

0.9825 

0.9758 

0.9674 

0.9569 


8 

0.9998 

0.9996 

0.9993 

0.9988 

0.9980 

0.9968 

0.9951 

0.9928 

0.9897 

0.9856 


9 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9993 

0.9989 

0.9982 

0.9973 

0.9959 


10 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9990 


11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X; 

X 


0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0,30 


“o 

0.0071 

0.0054 

0.0041 

0.0031 

0.0024 

0.0018 

0.0013 

0.0010 

0.0008 

0.0006 


1 

0.0466 

0.0375 

0.0301 

0.0240 

0.0190 

0.0150 

0.0118 

0.0093 

0.0072 

0.0056 


2 

0.1517 

0.1281 

0.1075 

0.0898 

0.0745 

0.0615 

0.0506 

0.0413 

0.0336 

0.0271 


3 

0.3286 

0.2898 

0.2540 

0.2213 

0.1917 

0.1650 

0.1413 

0.1202 

0.1018 

0.0856 


4 

0.5403 

0.4951 

0.4509 

0.4083 

0.3674 

0.3287 

0.2923 

0.2584 

0.2271 

0.1984 


5 

0.7316 

0.6920 

0.6509 

0.6090 

0.5666 

0.5242 

0.4822 

0.4411 

0.4011 

0.3627 


6 

0.8672 

0.8400 

0.8103 

0.7780 

0.7436 

0.7073 

0.6695 

0.6305 

0.5907 

0.5505 


7 

0.9444 

0.9295 

0.9122 

0.8924 

0.8701 

0.8452 

0.8179 

0.7883 

0.7566 

0.7230 


8 

0.9803 

0.9737 

0.9655 

0.9556 

0.9439 

0.9300 

0.9140 

0.8958 

0.8752 

0.8523 


9 

0.9941 

0.9917 

0.9885 

0.9845 

0.9794 

0.9731 

0.9654 

0.9561 

0.9452 

0.9324 


10 

0.9985 

0.9978 

0.9968 

0.9954 

0.9936 

0.9912 

0.9881 

0.9843 

0.9795 

0.9736 


11 

0.9997 

0.9995 

0.9992 

0.9989 

0.9983 

0.9976 

0.9966 

0.9952 

0.9935 

0.9913 


12 

0.9999 

0.9999 

0.9998 

0.9998 

0.9996 

0.9994 

0.9992 

0.9988 

0.9983 

0.9976 


13 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 


14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 


15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 


0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 


0 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 


1 

0.0043 

0.0033 

0.0025 

0.0019 

0.0014 

0.0011 

0.0008 

0.0006 

0.0004 

0.0003 


2 

0.0218 

0.0174 

0.0139 

0.0110 

0.0086 

0.0067 

0.0052 

0.0041 

0.0031 

0.0024 


3 

0.0716 

0.0596 

0.0492 

0.0405 

0.0331 

0.0269 

0.0217 

0.0174 

0.0139 

0.0110 


4 

0.1723 

0.1487 

0.1277 

0.1089 

0.0924 

0.0779 

0.0652 

0.0543 

0.0449 

0.0370 


5 

0.3261 

0.2914 

0.2590 

0.2288 

0.2009 

0.1753 

0.1521 

0.1312 

0.1124 

0.0957 


6 

0.5103 

0.4705 

0.4314 

0.3934 

0.3567 

0.3216 

0.2882 

0.2568 

0.2275 

0.2002 




TABLE A 

(continued) 


a? = 21 ( continued ) 


\p 

X \ 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

7 

0.6877 

0.6511 

0.6135 

0.5752 

0.5365 

0.4978 

0.4595 

0.4218 

0.3851 

0.3495 

8 

0.8272 

0.7998 

0.7704 

0.7390 

0.7059 

0.6713 

0.6355 

0.5988 

0.5614 

0.5237 

9 

0.9177 

0.9009 

0.8820 

0.8609 

0.8377 

0.8123 

0.7848 

0.7554 

0.7243 

0.6914 

10 

0.9665 

0.9580 

0.9480 

0.9363 

0.9228 

0.9074 

0.8901 

0.8707 

0.8492 

0.8256 

11 

0.9884 

0.9849 

0.9805 

0.9751 

0.9687 

0.9610 

0.9519 

0.9413 

0.9291 

0.9151 

12 

0.9966 

0.9954 

0.9938 

0.9918 

0.9892 

0.9861 

0.9821 

0.9774 

0.9716 

0.9648 

13 

0.9992 

0.9988 

0.9984 

0.9977 

0.9969 

0.9958 

0.9944 

0.9927 

0.9905 

0.9877 

14 

0.9998 

0.9998 

0.9996 

0.9995 

0.9993 

0.9990 

0.9986 

0.9980 

0.9973 

0.9964 

15 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

2 

0.0018 

0.0014 

0.0010 

0.0008 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

3 

0.0087 

0.0068 

0.0053 

0.0041 

0.0031 

0.0024 

0.0018 

0.0014 

0.0010 

0.0007 

4 

0.0302 

0.0245 

0.0198 

0.0158 

0.0126 

0.0099 

0.0078 

0.0061 

0.0047 

0.0036 

5 

0.0810 

0.0681 

0.0569 

0.0472 

0.0389 

0.0319 

0.0259 

0.0209 

0.0167 

0.0133 

6 

0.1752 

0.1523 

0.1316 

0.1130 

0.0964 

0.0816 

0.0687 

0.0574 

0.0476 

0.0392 

7 

0.3155 

0.2830 

0.2524 

0.2237 

0.1971 

0.1725 

0.1500 

0.1295 

0.1111 

0.0946 

8 

0.4860 

0.4487 

0.4119 

0.3760 

0.3413 

0.3079 

0.2761 

0.2461 

0.2179 

0.1917 

9 

0.6572 

0.6219 

0.5856 

0.5488 

0.5117 

0.4746 

0.4377 

0.4015 

0.3661 

0.3318 

10 

0.8000 

0.7724 

0.7429 

0.7118 

0.6790 

0.6449 

0.6097 

0.5736 

0.5369 

0.5000 

11 

0.8992 

0.8814 

0.8616 

0.8398 

0.8159 

0.7900 

0.7622 

0.7325 

0.7011 

0.6682 

12 

0.9567 

0.9472 

0.9362 

0.9236 

0.9092 

0.8930 

0.8749 

0.8547 

0.8326 

0.8083 

13 

0.9843 

0.9802 

0.9752 

0.9692 

0.9621 

0.9538 

0.9441 

0.9328 

0.9200 

0.9054 

14 

0.9953 

0.9938 

0.9920 

0.9897 

0.9868 

0.9833 

0.9791 

0.9740 

0.9680 

0.9608 

15 

0.9989 

0.9984 

0.9979 

0.9972 

0.9963 

0.9951 

0.9936 

0.9918 

0.9895 

0.9867 

16 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9989 

0.9985 

0.9979 

0.9973 

0.9964 

17 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9993 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

19 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/? = 22 


\j 

X 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.8016 

0.6412 

0.5117 

0.4073 

0.3235 

0.2563 

0.2026 

0.1597 

0.1256 

0.0985 

1 

0.9798 

0.9290 

0.8598 

0.7808 

0.6982 

0.6163 

0.5381 

0.4652 

0.3988 

0.3392 

2 

0.9987 

0.9907 

0.9728 

0.9441 

0.9052 

0.8576 

0.8032 

0.7442 

0.6826 

0.6200 

3 

0.9999 

0.9991 

0.9962 

0.9895 

0.9778 

0.9602 

0.9362 

0.9059 

0.8696 

0.8281 

4 

1.0000 

0.9999 

0.9996 

0.9985 

0.9960 

0.9913 

0.9838 

0.9727 

0.9575 

0.9379 

5 

1.0000 

1.0000 

1.0000 

0.9998 

0.9994 

0.9985 

0.9967 

0.9936 

0.9888 

0.9818 

6 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9995 

0.9988 

0.9976 

0.9956 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9991 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.0770 

0.0601 

0.0467 

0.0362 

0.0280 

0.0216 

0.0166 

0.0127 

0.0097 

0.0074 

1 

0.2864 

0.2403 

0.2003 

0.1659 

0.1367 

0.1120 

0.0913 

0.0740 

0.0597 

0.0480 



TABLE A 

( continued) 


n = 22 ( continued ) 


X \ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

2 

0.5582 

0.4983 

0.4412 

0.3877 

0.3382 

0.2929 

0.2520 

0.2154 

0.1830 

0.1545 

3 

0.7821 

0.7328 

0.6812 

0.6283 

0.5752 

0.5226 

0.4715 

0.4224 

0.3758 

0.3320 

4 

0.9136 

0.8847 

0.8515 

0.8144 

0.7738 

0.7305 

0.6550 

0.6381 

0.5905 

0.5429 

5 

0.9721 

0.9593 

0.9432 

0.9235 

0.9001 

0.8730 

0.8424 

0.8086 

0.7719 

0.7326 

6 

0.9926 

0.9881 

0.9820 

0.9738 

0.9632 

0.9499 

0.9338 

0.9147 

0.8924 

0.8670 

7 

0.9984 

0.9971 

0.9952 

0.9925 

0.9886 

0.9834 

0.9766 

0.9679 

0.9570 

0.9439 

8 

0.9997 

0.9994 

0.9989 

0.9982 

0.9970 

0.9954 

0.9330 

0.9898 

0.9854 

0.9799 

9 

1.0000 

0.9999 

0.9998 

0.9996 

0.9993 

0.9989 

0.9982 

0.9972 

0.9958 

0.9939 

10 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9990 

0.9984 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9599 

0.9999 

0.9998 

0.9997 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x \ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0056 

0.0042 

0.0032 

0.0024 

0.0018 

0.0013 

0.0610 

0.0007 

0.0005 

0.0004 

1 

0.0383 

0.0305 

0.0241 

0.0190 

0.0149 

0.0116 

0.0090 

0.0069 

0.0053 

0.0041 

2 

0.1296 

0.1081 

0.0897 

0.0740 

0.0606 

0.0495 

0.0401 

0.0323 

0.0259 

0.0207 

3 

0.2915 

0.2542 

0.2203 

0.1897 

0.1624 

0.1381 

0.1168 

0.0981 

0.0820 

0.0681 

4 

0.4958 

0.4499 

0.4057 

0.3634 

0.3235 

0.2861 

0.2615 

0.2197 

0.1907 

0.1645 

5 

0.6914 

0.6486 

0.6050 

0.5608 

0.5168 

0.4733 

0.4309 

0.3899 

0.3507 

0.3134 

6 

0.8387 

0.8075 

0.7736 

0.7375 

0.6994 

0.6597 

0 . 6 f 89 

0.5774 

0.5357 

0.4942 

7 

0.9282 

0.9098 

0.8888 

0.8650 

0.8385 

0.8094 

0.7779 

0.7441 

0.7085 

0.6712 

8 

0.9728 

0.9640 

0.9533 

0.9405 

0.9254 

0.9080 

0.8381 

0.8657 

0.8408 

0.8135 

9 

0.9912 

0.9877 

0.9832 

0.9776 

0.9705 

0.9619 

0.9615 

0.9392 

0.9249 

0.9084 

10 

0.9976 

0.9964 

0.9949 

0.9928 

0.9900 

0.9865 

0.9320 

0.9764 

0.9695 

0.9613 

11 

0.9994 

0.9991 

0.9987 

0.9980 

0.9971 

0.9959 

0.9643 

0.9922 

0.9894 

0.9860 

12 

0.9999 

0.9998 

0.9997 

0,9995 

0.9993 

0.9990 

0.9685 

0.9978 

0.9969 

0.9957 

13 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9697 

0.9995 

0.9992 

0.9989 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0003 

0.0002 

0.0001 

0.0001 

0.0001 

0.0001 

0.0660 

0.0000 

0.0000 

0.0000 

1 

0.0031 

0.0023 

0.0018 

0.0013 

0.0010 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

2 

0.0164 

0.0129 

0.0101 

0.0079 

0.0061 

0.0047 

0.0636 

0.0027 

0.0021 

0.0016 

3 

0.0562 

0.0461 

0.0376 

0.0304 

0.0245 

0.0196 

0.0156 

0.0123 

0.0097 

0.0076 

4 

0.1411 

0.1202 

0.1018 

0.0856 

0.0716 

0.0595 

0.0491 

0.0403 

0.0328 

0.0266 

5 

0.2784 

0.2458 

0.2156 

0.1880 

0.1629 

0.1402 

0.1200 

0.1020 

0.0861 

0.0722 

6 

0.4532 

0.4132 

0.3745 

0.3374 

0.3022 

0.2689 

0.2379 

0.2091 

0.1826 

0.1584 

7 

0.6327 

0.5933 

0.5534 

0.5134 

0.4736 

0.4344 

0.3961 

0.3591 

0.3236 

0.2898 

8 

0.7840 

0.7522 

0.7186 

0.6833 

0.6466 

0.6089 

0.5704 

0.5315 

0.4926 

0.4540 

9 

0.8896 

0.8686 

0.8452 

0.8195 

0.7916 

0.7615 

0.7296 

0.6959 

0.6607 

0.6244 

10 

0.9514 

0.9397 

0.9262 

0.9107 

0.8930 

0.8732 

0.8511 

0.8269 

0.8005 

0.7720 

11 

0.9816 

0.9763 

0.9698 

0.9619 

0.9526 

0.9417 

0.9260 

0.9145 

0.8979 

0.8793 

12 

0.9941 

0.9920 

0.9894 

0.9861 

0.9820 

0.9770 

0 . 9/10 

0.9637 

0.9550 

0.9449 

13 

0.9984 

0.9977 

0.9969 

0.9957 

0.9942 

0.9923 

0.9899 

0.9869 

0.9831 

0.9785 

14 

0.9996 

0.9995 

0.9992 

0.9989 

0.9984 

0.9978 

0 . 99/0 

0.9960 

0.9947 

0.9930 

15 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9990 

0.9986 

0.9981 

16 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9939 

0.9998 

0.9997 

0.9996 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 



TABLE A 

(continued) 


n -22 ( continued ) 


X \ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 ' 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0001 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

2 

0.0012 

0.0009 

0.0006 

0.0005 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

3 

0.0059 

0.0045 

0.0035 

0.0026 

0.0020 

0.0015 

0.0011 

0.0008 

0.0006 

0.0004 

4 

0.0214 

0.0171 

0.0135 

0.0107 

0.0083 

0.0065 

0.0050 

0.0038 

0.0029 

0.0022 

5 

0.0602 

0.0498 

0.0409 

0.0334 

0.0271 

0.0218 

0.0174 

0.0138 

0.0108 

0.0085 

6 

0.1366 

0.1170 

0.0995 

0.0841 

0.0705 

0.0587 

0.0486 

0.0399 

0.0325 

0.0262 

7 

0.2580 

0.2281 

0.2005 

0.1750 

0.1518 

0.1307 

0.1118 

0.0949 

0.0800 

0.0669 

8 

0.4161 

0.3791 

0.3433 

0.3090 

0.2764 

0.2456 

0.2168 

0.1901 

0.1656 

0.1431 

9 

0.5870 

0.5491 

0.5109 

0.4728 

0.4350 

0.3979 

0.3618 

0.3269 

0.2935 

0.2617 

10 

0.7415 

0.7092 

0.6753 

0.6401 

0.6037 

0.5665 

0.5289 

0.4910 

0.4532 

0.4159 

11 

0.8585 

0.8356 

0.8106 

0.7834 

0.7543 

0.7233 

0.6905 

0.6562 

0.6207 

0.5841 

12 

0.9331 

0.9196 

0.9041 

0.8867 

0.8672 

0.8456 

0.8219 

0.7961 

0.7682 

0.7383 

13 

0.9730 

0.9663 

0.9584 

0.9491 

0.9383 

0.9258 

0.9115 

0.8953 

0.8771 

0.8569 

14 

0.9908 

0.9881 

0.9848 

0.9807 

0.9757 

0.9697 

0.9626 

0.9543 

0.9445 

0.9331 

15 

0.9974 

0.9965 

0.9953 

0.9939 

0.9920 

0.9897 

0.9868 

0.9833 

0.9790 

0.9738 

16 

0.9994 

0.9992 

0.9988 

0.9984 

0.9979 

0.9971 

0.9962 

0.9950 

0.9935 

0.9915 

17 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

0.9994 

0.9991 

0.9988 

0.9984 

0.9978 

18 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9996 

19 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


/? = 23 


x 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.7936 

0.6283 

0.4963 

0.3911 

0.3074 

0.2410 

0.1884 

0.1469 

0.1143 

0.0886 

1 

0.9780 

0.9233 

0.8493 

0.7658 

0.6794 

0.5947 

0.5146 

0.4408 

0.3742 

0.3151 

2 

0.9985 

0.9895 

0.9695 

0.9376 

0.8948 

0.8431 

0.7846 

0.7219 

0.6570 

0.5920 

3 

0.9999 

0.9990 

0.9955 

0.9877 

0.9742 

0.9541 

0.9269 

0.8930 

0.8528 

0.8073 

4 

1.0000 

0.9999 

0.9995 

0.9981 

0.9951 

0.9895 

0.9805 

0.9674 

0.9496 

0.9269 

5 

1.0000 

1.0000 

1,0000 

0.9998 

0.9992 

0.9981 

0.9958 

0.9920 

0.9860 

0.9774 

6 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9993 

0.9984 

0.9968 

0.9942 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9997 

0.9994 

0.9988 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\p 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.0685 

0.0529 

0.0406 

0.0312 

0.0238 

0.0181 

0.0138 

0.0104 

0.0079 

0.0059 

1 

0.2634 

0.2186 

0.1803 

0.1478 

0.1204 

0.0976 

0.0786 

0.0630 

0.0502 

0.0398 

2 

0.5283 

0.4673 

0.4099 

0.3566 

0.3080 

0.2640 

0.2247 

0.1900 

0.1596 

0.1332 

3 

0.7575 

0.7047 

0.6500 

0.5946 

0.5396 

0.4859 

0.4342 

0.3851 

0.3391 

0.2965 

4 

0.8991 

0.8665 

0.8294 

0.7883 

0.7440 

0.6972 

0.6487 

0.5993 

0.5497 

0.5007 

5 

0.9656 

0.9504 

0.9313 

0.9082 

0.8811 

0.8502 

0.8157 

0.7779 

0.7374 

0.6947 

6 

0.9903 

0.9847 

0.9769 

0.9667 

0.9537 

0.9376 

0.9183 

0.8956 

0.8695 

0.8402 

7 

0.9977 

0.9960 

0.9935 

0.9899 

0.9848 

0.9780 

0.9693 

0.9583 

0.9447 

0.9285 

8 

0.9995 

0.9991 

0.9985 

0.9974 

0.9958 

0.9934 

0.9902 

0.9858 

0.9800 

0.9727 

9 

0.9999 

0.9998 

0.9997 

0.9994 

0.9990 

0.9983 

0.9973 

0.9959 

0.9938 

0.9911 

10 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9990 

0.9984 

0.9975 

11 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


TABLE A 

( continued) 


n = 23 ( continued ) 


0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 


0 

0.0044 

0.0033 

0.0025 

0.0018 

0.0013 

0.0010 

0.0007 

0.0005 

0.0004 

0.0003 

1 

0.0314 

0.0247 

0.0193 

0.0150 

0.0116 

0.0089 

0.0068 

0.0052 

0.0039 

0.0030 

2 

0.1105 

0.0911 

0.0746 

0.0608 

0.0492 

0.0396 

0.0317 

0.0252 

0.0200 

0.0157 

3 

0.2575 

0.2221 

0.1903 

0.1620 

0.1370 

0.1151 

0.0961 

0.0797 

0.0657 

0.0538 

4 

0.4529 

0.4069 

0.3630 

0.3217 

0.2832 

0.2477 

0.2151 

0.1857 

0.1592 

0.1356 

5 

0.6503 

0.6049 

0.5591 

0.5134 

0.4685 

0.4247 

0.3825 

0.3423 

0.3043 

0.2688 

6 

0.8077 

0.7725 

0.7348 

0.6951 

0.6537 

0.6113 

0.5682 

0.5249 

0.4821 

0.4399 

7 

0.9094 

0.8873 

0.8623 

0.8344 

0.8037 

0.7705 

0.7349 

0.6975 

0.6584 

0.6181 

8 

0.9634 

0.9521 

0.9384 

0.9223 

0.9037 

0.8823 

0.8583 

0.8317 

0.8025 

0.7709 

9 

0.9873 

0.9825 

0,9763 

0.9687 

0.9592 

0.9479 

0.9344 

0.9186 

0.9005 

0.8799 

10 

0.9963 

0.9945 

0.9922 

0.9891 

0.9851 

0.9801 

0.9738 

0.9660 

0.9566 

0.9454 

11 

0.9991 

0.9985 

0.9978 

0.9968 

0.9954 

0.9935 

0.9910 

0.9878 

0.9837 

0.9786 

12 

0.9998 

0.9997 

0.9995 

0.9992 

0.9988 

0.9982 

0.9973 

0.9962 

0.9947 

0.9928 

13 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9993 

0.9990 

0.9985 

0.9979 

14 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0022 

0.0017 

0.0012 

0.0009 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

0.0001 

2 

0.0123 

0.0095 

0.0074 

0.0057 

0.0043 

0.0033 

0.0025 

0.0018 

0.0014 

0.0010 

3 

0.0438 

0.0355 

0.0285 

0.0228 

0.0181 

0.0143 

0.0112 

0.0087 

0.0067 

0.0052 

4 

0.1148 

0.0965 

0.0806 

0.0669 

0.0551 

0.0451 

0.0367 

0.0297 

0.0238 

0.0190 

5 

0.2358 

0.2056 

0.1781 

0.1532 

0.1309 

0.1112 

0.0938 

0.0785 

0.0653 

0.0540 

6 

0.3990 

0.3596 

0.3221 

0.2866 

0.2534 

0.2226 

0.1942 

0.1684 

0.1450 

0.1240 

7 

0.5771 

0.5357 

0.4944 

0.4535 

0.4136 

0.3748 

0.3376 

0.3021 

0.2686 

0.2373 

8 

0.7371 

0.7014 

0.6641 

0.6255 

0.5860 

0.5460 

0.5059 

0.4660 

0.4267 

0.3884 

9 

0.8569 

0.8313 

0.8034 

0.7732 

0.7408 

0.7066 

0.6707 

0.6334 

0.5952 

0.5562 

10 

0.9322 

0.9170 

0.8995 

0.8797 

0.8575 

0.8330 

0.8062 

0.7771 

0.7460 

0.7129 

11 

0.9722 

0.9646 

0.9554 

0.9445 

0.9318 

0.9170 

0.9002 

0.8812 

0.8599 

0.8364 

12 

0.9902 

0.9870 

0.9829 

0.9779 

0.9717 

0.9643 

0.9554 

0.9450 

0.9328 

0.9187 

13 

0.9971 

0.9959 

0.9944 

0.9925 

0.9900 

0.9868 

0.9829 

0.9780 

0.9722 

0.9651 

14 

0.9992 

0.9989 

0.9984 

0.9978 

0.9970 

0.9958 

0.9944 

0.9925 

0.9902 

0.9872 

15 

0.9998 

0.9998 

0.9996 

0.9995 

0.9992 

0.9989 

0.9985 

0.9979 

0.9971 

0.9960 

16 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9996 

0.9995 

0.9993 

0.9990 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 



0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

o.oobo 

0.0000 

0.0000 

0.0000 

1 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

2 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

3 

0.0039 

0.0030 

0.0022 

0.0017 

0.0012 

0.0009 

0.0007 

0.0005 

0.0003 

0.0002 

4 

0.0150 

0.0118 

0.0092 

0.0071 

0.0055 

0.0042 

0.0032 

0.0024 

0.0018 

0.0013 

5 

0.0443 

0.0361 

0.0292 

0.0234 

0.0186 

0.0147 

0.0116 

0.0090 

0.0069 

0.0053 

6 

0.1053 

0 . 0888 ' 

0.0743 

0.0618 

0.0510 

0.0417 

0.0339 

0.0273 

0.0219 

0.0173 

7 

0.2082 

0.1815 

0.1571 

0.1350 

0.1152 

0.0976 

0.0821 

0.0685 

0.0567 

0.0466 

8 

0.3513 

0.3157 

0.2819 

0.2500 

0.2203 

0.1927 

0.1674 

0.1444 

0.1236 

0.1050 

9 

0.5170 

0.4777 

0.4389 

0.4007 

0.3636 

0.3278 

0.2936 

0.2612 

0.2308 

0.2024 

10 

0.6782 

0.6420 

0.6046 

0.5665 

0.5278 

0.4890 

0.4503 

0.4122 

0.3749 

0.3388 






TABLE A 

( continued) 


n = 23 ( continued ) 


X 

\\ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 


11 

0.8105 

0.7825 

0.7524 

0.7204 

0.6865 

0.6512 

0.6145 

0.5769 

0.5386 

0.5000 


12 

0.9025 

0.8843 

0.8639 

0.8413 

0.8164 

0.7893 

0.7602 

0.7289 

0.6959 

0.6612 


13 

0.9566 

0.9467 

0.9351 

0.9217 

0.9063 

0.8889 

0.8694 

0.8477 

0.8237 

0.7976 


14 

0.9835 

0.9789 

0.9734 

0.9668 

0.9589 

0.9495 

0.9386 

0.9260 

0.9115 

0.8950 


15 

0.9947 

0.9930 

0.9908 

0.9881 

0.9847 

0.9805 

0.9755 

0.9693 

0.9621 

0.9534 


16 

0.9986 

0.9980 

0.9973 

0.9964 

0.9952 

0.9937 

0.9918 

0.9894 

0.9864 

0.9827 


17 

0.9997 

0.9996 

0.9994 

0.9991 

0.9988 

0.9983 

0.9977 

0.9970 

0.9960 

0.9947 


18 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 

0.9996 

0.9995 

0.9993 

0.9990 

0.9987 


19 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9998 


20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 






n 

= 24 






X 

X 

\ P ^ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 


0 

0.7857 

0.6158 

0.4814 

0.3754 

0.2920 

0.2265 

0.1752 

0.1352 

0.1040 

0.0798 


1 

0.9761 

0.9174 

0.8388 

0.7508 

0.6608 

0.5735 

0.4918 

0.4173 

0.3508 

0.2925 


2 

0.9983 

0.9882 

0.9659 

0.9307 

0.8841 

0.8282 

0.7657 

0.6994 

0.6316 

0.5643 


3 

0.9999 

0.9988 

0.9947 

0.9857 

0.9702 

0.9474 

0.9170 

0.8793 

0.8352 

0.7857 


4 

1.0000 

0.9999 

0.9994 

0.9977 

0.9940 

0.9873 

0.9767 

0.9614 

0.9409 

0.9149 


5 

1.0000 

1.0000 

0.9999 

0.9997 

0.9990 

0.9975 

0.9947 

0.9900 

0.9827 

0.9723 


6 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9996 

0.9990 

0.9979 

0.9958 

0.9925 


7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9996 

0.9992 

0.9983 


8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9997 


9 

1.0000 

1.0000 

1,0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 


10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

\\ 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 


0 

0.0610 

0.0465 

0.0354 

0.0268 

0.0202 

0.0152 

0.0114 

0.0085 

0.0064 

0.0047 


1 

0.2420 

0.1987 

0.1621 

0.1315 

0.1059 

0.0849 

0.0676 

0.0535 

0.0422 

0.0331 


2 

0.4992 

0.4375 

0.3800 

0.3274 

0.2798 

0.2374 

0.1999 

0.1671 

0.1388 

0.1145 


3 

0.7323 

0.6762 

0.6188 

0.5613 

0.5049 

0.4504 

0.3986 

0.3500 

0.3050 

0.2639 


4 

0.8835 

0.8471 

0.8061 

0.7612 

0.7134 

0.6634 

0.6122 

0.5607 

0.5097 

0.4599 


5 

0.9583 

0.9403 

0.9180 

0.8914 

0.8606 

0.8257 

0.7873 

0.7458 

0.7017 

0.6559 


6 

0.9876 

0.9806 

0.9710 

0.9585 

0.9428 

0.9236 

0.9008 

0.8744 

0.8444 

0.8111 


7 

0.9969 

0.9947 

0.9913 

0.9866 

0.9801 

0.9716 

0.9606 

0.9470 

0.9304 

0.9108 


8 

0.9993 

0.9988 

0.9978 

0.9963 

0.9941 

0.9910 

0.9866 

0.9809 

0.9733 

0.9638 


9 

0.9999 

0.9998 

0.9995 

0.9991 

0.9985 

0.9976 

0.9961 

0.9941 

0.9912 

0.9874 


10 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

0.9994 

0.9990 

0.9984 

0.9975 

0.9962 


11 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

0.9994 

0.9990 


12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 


13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

X 

\\ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 


0 

0.0035 

0.0026 

0.0019 

0.0014 

0.0010 

0.0007 

0.0005 

0.0004 

0.0003 

0.0002 


1 

0.0258 

0.0200 

0.0154 

0.0118 

0.0090 

0.0069 

0.0052 

0.0039 

0.0029 

0.0022 


2 

0.0939 

0.0765 

0.0619 

0.0498 

0.0398 

0.0316 

0.0250 

0.0196 

0.0153 

0.0119 


3 

0.2266 

0.1933 

0.1637 

0.1377 

0.1150 

0.0955 

0.0787 

0.0645 

0.0524 

0.0424 


4 

0.4119 

0.3662 

0.3233 

0.2834 

0.2466 

0.2132 

0.1830 

0.1560 

0.1321 

0.1111 


5 

0.6089 

0.5614 

0.5140 

0.4674 

0.4222 

0.3786 

0.3373 

0.2984 

0.2622 

0.2288 


6 

0.7747 

0.7356 

0.6944 

0.6515 

0.6074 

0.5627 

0.5180 

0.4738 

0.4305 

0.3886 


7 

0.8880 

0.8621 

0.8330 

0.8009 

0.7662 

0.7291 

0.6899 

0.6492 

0.6073 

0.5647 


8 

0.9521 

0.9378 

0.9209 

0.9012 

0.8787 

0.8533 

0.8250 

0.7941 

0.7607 

0.7250 


TABLE A 

(continued) 


n = 24 ( continued ) 


X \ 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

9 

0.9823 

0.9758 

0.9676 

0.9575 

0.9453 

0.9308 

0.9138 

0.8943 

0.8721 

0.8472 

10 

0.9944 

0.9919 

0.9886 

0.9842 

0.9787 

0.9717 

0.9631 

0.9527 

0.9403 

0.9258 

11 

0.9985 

0.9977 

0.9965 

0.9949 

0.9928 

0.9900 

0.9863 

0.9817 

0.9758 

0.9686 

12 

0.9996 

0.9994 

0.9991 

0.9986 

0.9979 

0.9969 

0.9956 

0.9938 

0.9915 

0.9885 

13 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9988 

0.9982 

0.9974 

0.9964 

14 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9993 

0.9990 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

X 

0.31 

0.32 

0.33 

0,34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0016 

0.0012 

0.0009 

0.0006 

0.0005 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

2 

0.0092 

0.0070 

0.0053 

0.0040 

0.0030 

0.0023 

0.0017 

0.0012 

0.0009 

0.0007 

3 

0.0341 

0.0272 

0.0215 

0.0170 

0.0133 

0.0103 

0.0080 

0.0061 

0.0046 

0.0035 

4 

0.0928 

0.0770 

0.0634 

0.0519 

0.0422 

0.0340 

0.0273 

0.0217 

0.0171 

0.0134 

5 

0.1983 

0.1707 

0,1459 

0.1239 

0.1044 

0.0874 

0.0727 

0.0600 

0.0491 

0.0400 

6 

0.3484 

0.3103 

0.2746 

0.2413 

0.2106 

0.1825 

0.1571 

0.1342 

0.1139 

0.0960 

7 

0.5219 

0.4794 

0.4375 

0.3968 

0.3575 

0.3200 

0.2845 

0.2513 

0.2204 

0.1919 

8 

0.6875 

0.6484 

0.6081 

0.5670 

0.5257 

0.4844 

0.4436 

0.4037 

0.3650 

0.3279 

9 

0.8197 

0.7898 

0.7574 

0.7230 

0.6867 

0.6488 

0.6097 

0.5698 

0.5295 

0.4891 

10 

0.9089 

0.8896 

0.8678 

0.8435 

0.8167 

0.7875 

0.7560 

0.7225 

0.6872 

0.6502 

11 

0.9598 

0.9493 

0.9369 

0.9225 

0.9058 

0.8868 

0.8654 

0.8416 

0.8155 

0.7870 

12 

0.9846 

0.9798 

0.9738 

0.9665 

0.9577 

0.9473 

0.9350 

0.9207 

0.9043 

0.8857 

13 

0.9949 

0.9931 

0.9906 

0.9875 

0.9836 

0.9787 

0.9727 

0.9655 

0.9568 

0.9465 

14 

0.9986 

0.9979 

0.9971 

0.9960 

0.9945 

0.9926 

0.9901 

0.9870 

0.9831 

0.9783 

15 

0.9997 

0.9995 

0.9992 

0.9989 

0.9984 

0.9978 

0.9970 

0.9958 

0.9944 

0.9925 

16 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9989 

0.9984 

0.9978 

17 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

19 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\p 
x \ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

2 

0.0005 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

3 

0.0026 

0.0020 

0.0014 

0.0011 

0.0008 

0.0006 

0.0004 

0.0003 

0.0002 

0.0001 

4 

0.0105 

0.0081 

0.0062 

0.0047 

0.0036 

0.0027 

0.0020 

0.0015 

0.0011 

0.0008 

5 

0.0323 

0.0259 

0.0206 

0.0162 

0.0127 

0.0099 

0.0076 

0.0058 

0.0044 

0.0033 

6 

0.0803 

0.0666 

0.0549 

0.0449 

0.0364 

0.0293 

0.0234 

0.0185 

0.0146 

0.0113 

7 

0.1660 

0.1425 

0.1215 

0.1028 

0.0863 

0.0719 

0.0594 

0.0487 

0.0396 

0.0320 

8 

0.2926 

0.2593 

0.2282 

0.1994 

0.1730 

0.1490 

0.1273 

0.1080 

0.0908 

0.0758 

9 

0.4490 

0.4097 

0.3714 

0.3344 

0.2991 

0.2657 

0.2343 

0.2052 

0.1783 

0.1537 

10 

0.6121 

0.5730 

0.5333 

0.4935 

0.4539 

0.4148 

0.3767 

0.3397 

0.3043 

0.2706 

11 

0.7563 

0.7235 

0.6889 

0.6526 

0.6151 

0.5766 

0.5374 

0.4979 

0.4584 

0.4194 

12 

0.8648 

0.8416 

0.8160 

0.7881 

0.7580 

0.7258 

0.6917 

0.6560 

0.6188 

0.5806 

13 

0.9345 

0.9205 

0.9045 

0.8863 

0.8659 

0.8431 

0.8181 

0.7907 

0.7611 

0.7294 

14 

0.9725 

0.9654 

0.9569 

0.9469 

0.9352 

0.9217 

0.9061 

0.8884 

0.8685 

0.8463 

15 

0.9901 

0.9871 

0.9833 

0.9787 

0.9731 

0.9663 

0.9581 

0.9485 

0.9373 

0.9242 

16 

0.9970 

0.9959 

0.9945 

0.9927 

0.9905 

0.9876 

0.9841 

0.9798 

0.9745 

0.9680 

17 

0.9992 

0.9989 

0.9985 

0.9979 

0.9972 

0.9962 

0.9949 

0.9933 

0.9913 

0.9887 

18 

0.9998 

0.9998 

0.9997 

0.9995 

0.9993 

0.9990 

0.9987 

0.9982 

0.9975 

0.9967 

19 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 





TABLE A 

( continued ) 


n = 24 (continued) 


x 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

21 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


n-2b 


x \ 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.10 

0 

0.7778 

0.6035 

0.4670 

0.3604 

0.2774 

0.2129 

0.1630 

0.1244 

0.0946 

0.0718 

1 

0.9742 

0.9114 

0.8280 

0.7358 

0.6424 

0.5527 

0.4696 

0.3947 

0.3286 

0.2712 

2 

0.9980 

0.9868 

0.9620 

0.9235 

0.8729 

0.8129 

0.7466 

0.6768 

0.6063 

0.5371 

3 

0.9999 

0.9986 

0.9938 

0.9835 

0.9659 

0.9402 

0.9064 

0.8649 

0.8169 

0.7636 

4 

1.0000 

0.9999 

0.9992 

0.9972 

0.9928 

0.9850 

0.9726 

0.9549 

0.9314 

0.9020 

5 

1.0000 

1.0000 

0.9999 

0.9996 

0.9988 

0.9969 

0.9935 

0.9877 

0.9790 

0.9666 

6 

1.0000 

1.0000 

1.0000 

1.0000 

0.9998 

0.9995 

0.9987 

0.9972 

0.9946 

0.9905 

7 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9995 

0.9989 

0.9977 

8 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9995 

9 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

10 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.11 

0.12 

0.13 

0.14 

0.15 

0.16 

0.17 

0.18 

0.19 

0.20 

0 

0.0543 

0.0409 

0.0308 

0.0230 

0.0172 

0.0128 

0.0095 

0.0070 

0.0052 

0.0038 

1 

0.2221 

0.1805 

0.1457 

0.1168 

0.0931 

0.0737 

0.0580 

0.0454 

0.0354 

0.0274 

2 

0.4709 

0.4088 

0.3517 

0.3000 

0.2537 

0.2130 

0.1774 

0.1467 

0.1204 

0.0982 

3 

0.7066 

0.6475 

0.5877 

0.5286 

0.4711 

0.4163 

0.3648 

0.3171 

0.2734 

0.2340 

4 

0.8669 

0.8266 

0.7817 

0.7332 

0.6821 

0.6293 

0.5759 

0.5228 

0.4708 

0.4207 

5 

0.9501 

0.9291 

0.9035 

0.8732 

0.8385 

0.7998 

0.7575 

0.7125 

0.6653 

0.6167 

6 

0.9844 

0.9757 

0.9641 

0.9491 

0.9305 

0.9080 

0.8815 

0.8512 

0.8173 

0.7800 

7 

0.9959 

0.9930 

0.9887 

0.9827 

0.9745 

0.9639 

0.9505 

0.9339 

0.9141 

0.8909 

8 

0.9991 

0.9983 

0.9970 

0.9950 

0.9920 

0.9879 

0.9822 

0.9748 

0.9652 

0.9532 

9 

0.9998 

0.9996 

0.9993 

0.9987 

0.9979 

0.9965 

0.9945 

0.9917 

0.9878 

0.9827 

10 

1.0000 

0.9999 

0.9999 

0.9997 

0.9995 

0.9991 

0.9985 

0.9976 

0.9963 

0.9944 

11 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9998 

0.9997 

0.9994 

0.9990 

0.9985 

12 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9996 

13 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

14 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

x 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

0 

0.0028 

0.0020 

0.0015 

0.0010 

0.0008 

0.0005 

0.0004 

0.0003 

0.0002 

0.0001 

1 

0.0211 

0.0162 

0.0123 

0.0093 

0.0070 

0.0053 

0.0039 

0.0029 

0.0021 

0.0016 

2 

0.0796 

0.0640 

0.0512 

0.0407 

0.0321 

0.0252 

0.0196 

0.0152 

0.0117 

0.0090 

3 

0.1987 

0.1676 

0.1403 

0.1166 

0.0962 

0.0789 

0.0642 

0.0519 

0.0417 

0.0332 

4 

0.3730 

0.3282 

0.2866 

0.2484 

0.2137 

0.1826 

0.1548 

0.1304 

0.1090 

0.0905 

5 

0.5675 

0.5184 

0.4701 

0.4233 

0.3783 

0.3356 

0.2956 

0.2585 

0.2245 

0.1935 

6 

0.7399 

0.6973 

0.6529 

0.6073 

0.5611 

0.5149 

0.4692 

0.4247 

0.3817 

0.3407 

7 

0.8642 

0.8342 

0.8011 

0.7651 

0.7265 

0.6858 

0.6435 

0.6001 

0.5560 

0.5118 

8 

0.9386 

0.9212 

0.9007 

0.8772 

0.8506 

0.8210 

0.7885 

0.7535 

0.7162 

0.6769 

9 

0.9760 

0.9675 

0.9569 

0.9440 

0.9287 

0.9107 

0.8899 

0.8662 

0.8398 

0.8106 

10 

0.9918 

0.9883 

0.9837 

0.9778 

0.9703 

0.9611 

0.9498 

0.9364 

0.9205 

0.9022 

11 

0.9976 

0.9964 

0.9947 

0.9924 

0.9893 

0.9852 

0.9801 

0.9736 

0.9655 

0.9558 

12 

0.9994 

0.9990 

0.9985 

0.9977 

0.9966 

0.9951 

0.9931 

0.9904 

0.9870 

0.9825 

13 

0.9999 

0.9998 

0.9996 

0.9994 

0.9991 

0.9986 

0.9979 

0.9970 

0.9957 

0.9940 

14 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

0.9992 

0.9988 

0.9982 


TABLE A 

( continued ) 


/7 = 25 ( continued ) 


X 

0.21 

0.22 

0.23 

0.24 

0.25 

0.26 

0.27 

0.28 

0.29 

0.30 

15 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9995 

16 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

17 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

\P 

x 

0.31 

0.32 

0.33 

0.34 

0.35 

0.36 

0.37 

0.38 

0.39 

0.40 

0 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

1 

0.0011 

0.0008 

0.0006 

0.0004 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0001 

2 

0.0068 

0.0051 

0.0039 

0.0029 

0.0021 

0.0016 

0.0011 

0.0008 

0.0006 

0.0004 

3 

0.0263 

0.0207 

0.0162 

0.0126 

0.0097 

0.0074 

0.0056 

0.0043 

0.0032 

0.0024 

4 

0.0746 

0.0610 

0.0496 

0.0400 

0.0320 

0.0255 

0.0201 

0.0158 

0.0123 

0.0095 

5 

0.1656 

0.1407 

0.1187 

0.0994 

0.0826 

0.0682 

0.0559 

0.0454 

0.0367 

0.0294 

6 

0.3019 

0.2657 

0.2321 

0.2013 

0.1734 

0.1483 

0.1258 

0.1060 

0.0886 

0.0736 

7 

0.4681 

0.4253 

0.3837 

0.3439 

0.3061 

0.2705 

0.2374 

0.2068 

0.1789 

0.1536 

8 

0.6361 

0.5943 

0.5518 

0.5092 

0.4668 

0.4252 

0.3848 

0.3458 

0.3086 

0.2735 

9 

0.7787 

0.7445 

0.7081 

0.6700 

0.6303 

0.5896 

0.5483 

0.5067 

0.4653 

0.4246 

10 

0.8812 

0.8576 

0.8314 

0.8025 

0.7712 

0.7375 

0.7019 

0.6645 

0.6257 

0.5858 

11 

0.9440 

0.9302 

0,9141 

0.8956 

0.8746 

0.8510 

0.8249 

0.7964 

0.7654 

0.7323 

12 

0.9770 

0.9701 

0.9617 

0.9515 

0.9396 

0.9255 

0.9093 

0.8907 

0.8697 

0.8462 

13 

0.9917 

0.9888 

0.9851 

0.9804 

0.9745 

0.9674 

0.9588 

0.9485 

0.9363 

0.9222 

14 

0.9974 

0.9964 

0.9950 

0.9931 

0.9907 

0.9876 

0.9837 

0.9788 

0.9729 

0.9656 

15 

0.9993 

0.9990 

0.9985 

0.9979 

0.9971 

0.9959 

0.9944 

0.9925 

0.9900 

0.9868 

16 

0.9998 

0.9998 

0.9996 

0.9995 

0.9992 

0.9989 

0.9984 

0.9977 

0.9968 

0.9957 

17 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9994 

0.9992 

0.9988 

18 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9999 

0.9998 

0.9997 

19 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

20 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1 . 00.00 

1.0000 

1.0000 

1.0000 

\p 
x \ 

0.41 

0.42 

0.43 

0.44 

0.45 

0.46 

0.47 

0.48 

0.49 

0.50 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0,0000 

0.0000 

0.0000 

0.0000 

1 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

2 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

3 

0.0017 

0.0013 

0.0009 

0.0007 

0.0005 

0.0003 

0.0002 

0.0002 

0.0001 

0.0001 

4 

0.0073 

0.0055 

0.0042 

0.0031 

0.0023 

0.0017 

0.0012 

0.0009 

0.0006 

0.0005 

5 

0.0233 

0.0184 

0.0144 

0.0112 

0.0086 

0.0066 

0.0050 

0.0037 

0.0028 

0.0020 

6 

0.0606 

0.0495 

0.0401 

0.0323 

0.0258 

0.0204 

0.0160 

0.0124 

0.0096 

0.0073 

7 

0.1308 

0.1106 

0.0929 

0.0773 

0.0639 

0.0523 

0.0425 

0.0342 

0.0273 

0.0216 

8 

0.2407 

0.2103 

0.1823 

0.1569 

0.1340 

0.1135 

0.0954 

0.0795 

0.0657 

0.0539 

9 

0.3849 

0.3465 

0.3098 

0.2750 

0.2424 

0.2120 

0.1840 

0.1585 

0.1354 

0.1148 

10 

0.5452 

0.5044 

0.4637 

0.4235 

0.3843 

0.3462 

0.3098 

0.2751 

0.2426 

0.2122 

11 

0.6971 

0.6603 

0.6220 

0.5826 

0.5426 

0.5022 

0.4618 

0.4220 

0.3829 

0.3450 

12 

0.8203 

0.7920 

0.7613 

0.7285 

0.6937 

0.6571 

0.6192 

0.5801 

0.5402 

0.5000 

13 

0.9059 

0.8873 

0.8664 

0,8431 

0.8173 

0.7891 

0.7587 

0.7260 

0.6914 

0.6550 

14 

0.9569 

0.9465 

0.9344 

0.9203 

0.9040 

0.8855 

0.8647 

0.8415 

0.8159 

0.7878 

15 

0.9829 

0.9780 

0.9720 

0.9647 

0.9560 

0.9457 

0.9337 

0.9197 

0.9036 

0.8852 

16 

0.9942 

0.9922 

0.9897 

0.9866 

0.9826 

0.9778 

0.9719 

0.9648 

0.9562 

0.9461 

17 

0.9983 

0.9977 

0.9968 

0.9956 

0.9942 

0.9923 

0.9898 

0.9868 

0.9830 

0.9784 

18 

0.9996 

0.9994 

0.9992 

0.9988 

0.9984 

0.9977 

0.9969 

0.9959 

0.9945 

0.9927 

19 

0.9999 

0.9999 

0.9998 

0.9997 

0.9996 

0.9995 

0.9992 

0.9989 

0.9985 

0.9980 

20 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

0.9998 

0.9998 

0.9997 

0.9995 

21 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

0.9999 

0.9999 

22 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 


























































TABLE B 

( continued) 




2.0 

2.2 

2.4 

2.6 

2.8 

3.0 

3.2 

3.4 

0 


135 

111 

091 

074 

061 

050 

041 

033 

1 


406 

355 

308 

267 

231 

199 

171 

147 

2 


677 

623 

570 

518 

469 

423 

380 

340 

3 


857 

819 

779 

736 

692 

647 

603 

558 

4 


947 

928 

904 

877 

848 

815 

781 

744 

5 


983 

975 

964 

951 

935 

916 

895 

871 

6 


995 

993 

988 

983 

976 

966 

955 

942 

7 


999 

998 

997 

995 

992 

988 

983 

977 

8 


1000 

1000 

999 

999 

998 

996 

994 

992 

9 




1000 

1000 

999 

999 

998 

997 

10 






1000 

1000 

1000 

999 

11 









1000 


\ 

3.6 

3.8 

4.0 

4.2 

4.4 

4.6 

4.8 

5.0 

0 


027 

022 

018 

015 

012 

010 

008 

007 

1 


126 

107 

092 

078 

066 

056 

048 

040 

2 


303 

269 

238 

210 

185 

163 

143 

125 

3 


515 

473 

433 

395 

359 

326 

294 

265 

4 


706 

668 

629 

590 

551 

513 

476 

440 

5 


844 

816 

785 

753 

720 

686 

651 

616 

6 


927 

909 

889 

867 

844 

818 

791 

762 

7 


969 

960 

949 

936 

921 

905 

887 

867 

8 


988 

984 

979 

972 

964 

955 

944 

932 

9 


996 

994 

992 

989 

985 

980 

975 

968 

10 


999 

998 

997 

996 

994 

992 

990 

986 

11 


1000 

999 

999 

999 

998 

997 

996 

995 

12 



1000 

1000 

1000 

999 

999 

999 

998 

13 






1000 

1000 

1000 

999 

14 









1000 



5.2 

5.4 

5.6 

5.8 

6.0 

6.2 

6.4 

6.6 

0 


006 

005 

004 

003 

002 

002 

002 

001 

1 


034 

029 

024 

021 

017 

015 

012 

010 

2 


109 

095 

082 

072 

062 

054 

046 

040 

3 


238 

213 

191 

170 

151 

134 

119 

105 

4 


406 

373 

342 

313 

285 

259 

235 

213 

5 


581 

546 

512 

478 

446 

414 

384 

355 

6 


732 

702 

670 

638 

606 

574 

542 

511 

7 


845 

822 

797 

771 

744 

716 

687 

658 

8 


918 

903 

886 

867 

847 

826 

803 

780 

9 


960 

951 

941 

929 

916 

902 

886 

869 

10 


982 

977 

972 

965 

957 

949 

939 

927 

11 


993 

990 

988 

984 

980 

975 

969 

963 

12 


997 

996 

995 

993 

991 

989 

986 

982 

13 


999 

999 

998 

997 

996 

995 

994 

992 

14 


1000 

999 

999 

999 

999 

998 

997 

997 

15 



1000 

1000 

1000 

999 

999 

999 

999 

16 






1000 

1000 

1000 

999 

17 









1000 



6.8 

7.0 

7.2 

7.4 

7.6 

7.8 

8.0 

8.5 

0 


001 

001 

001 

001 

001 

000 

000 

000 

1 


009 

007 

006 

005 

004 

004 

003 

002 

2 


034 

030 

025 

022 

019 

016 

014 

009 

3 


093 

082 

072 

063 

055 

048 

042 

030 

4 


192 

173 

156 

140 

125 

112 

100 

074 

5 


327 

301 

276 

253 

231 

210 

191 

150 

6 


480 

450 

420 

392 

365 

338 

313 

256 

7 


628 

599 

569 

539 

510 

481 

453 

386 

8 


755 

729 

703 

676 

648 

620 

593 

523 





TABLE B 

(continued) 


\ 

X 

13.0 

13.5 

14.0 

14.5 

15 

16 

17 

18 

22 

992 

989 

983 

976 

967 

942 

905 

855 

23 

996 

994 

991 

986 

981 

963 

937 

899 

24 

998 

997 

995 

992 

989 

978 

959 

932 

25 

999 

998 

997 

996 

994 

987 

975 

955 

26 

1000 

999 

999 

998 

997 

993 

985 

972 

27 


1000 

999 

999 

998 

996 

991 

983 

28 



1000 

999 

999 

998 

995 

990 

29 




1000 

1000 

999 

997 

994 

30 






999 

999 

997 

31 






1000 

999 

998 

32 







1000 

999 

33 








1000 

X 

X 

19 

20 

21 

22 

23 

24 

25 


6 

001 

000 

000 

000 

000 

000 

000 


7 

002 

001 

000 

000 

000 

000 

000 


8 

004 

002 

001 

001 

000 

000 

000 


9 

009 

005 

003 

002 

001 

000 

000 


10 

018 

011 

006 

004 

002 

001 

001 


11 

035 

021 

013 

008 

004 

003 

001 


12 

061 

039 

025 

015 

009 

005 

003 


13 

098 

066 

043 

028 

017 

Oil 

006 


14 

150 

105 

072 

048 

031 

020 

012 


15 

215 

157 

111 

077 

052 

034 

022 


16 

292 

221 

163 

117 

082 

056 

038 


17 

378 

297 

227 

169 

123 

087 

060 


18 

469 

381 

302 

232 

175 

128 

092 


19 

561 

470 

384 

306 

238 

180 

134 


20 

647 

559 

471 

387 

310 

243 

185 


21 

725 

644 

558 

472 

389 

314 

247 


22 

793 

721 

640 

556 

472 

392 

318 


23 

849 

787 

716 

637 

555 

473 

394 


24 

893 

843 

782 

712 

635 

554 

473 


25 

927 

888 

838 

777 

708 

632 

553 


26 

951 

922 

883 

832 

772 

704 

629 


27 

969 

948 

917 

877 

827 

768 

700 


28 

980 

966 

944 

913 

873 

823 

763 


29 

988 

978 

963 

940 

908 

868 

818 


30 

993 

987 

976 

959 

936 

904 

863 


31 

996 

992 

985 

973 

956 

932 

900 


32 

998 

995 

991 

983 

971 

953 

929 


33 

999 

997 

994 

989 

981 

969 

950 


34 

999 

999 

997 

994 

988 

979 

966 


35 

1000 

999 

998 

996 

993 

987 

978 


36 


1000 

999 

998 

996 

992 

985 


37 



999 

999 

997 

995 

991 


38 



1000 

999 

999 

997 

994 


39 




1000 

999 

998 

997 


40 





1000 

999 

998 


41 






999 

999 


42 






1000 

999 


43 







1000 






TABLE C 

Normal curve areas 
P{z < z 0 ) 


Entries in the body of the table 
are areas between - o ° and z . 




z 

- 0.09 

- 0.08 

- 0.07 

- 0.06 

- 0.05 

- 0.04 

- 0.03 

- 0.02 

- 0.01 

0.00 

z 

- 3.80 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

- 3.80 

- 3.70 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

- 3.70 

- 3.60 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0002 

.0002 

- 3.60 

- 3.50 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

- 3.50 

- 3.40 

.0002 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

- 3.40 

- 3.30 

.0003 

.0004 

.0004 

.0004 

.0004 

.0004 

.0004 

.0005 

.0005 

.0005 

- 3.30 

- 3.20 

.0005 

.0005 

.0005 

.0006 

.0006 

.0006 

.0006 

.0006 

.0007 

.0007 

- 3.20 

- 3.10 

.0007 

.0007 

.0008 

.0008 

.0008 

.0008 

.0009 

.0009 

.0009 

.0010 

- 3.10 

- 3.00 

.0010 

.0010 

.0011 

.0011 

.0011 

.0012 

.0012 

.0013 

.0013 

.0013 

- 3.00 

- 2.90 

.0014 

.0014 

.0015 

.0015 

.0016 

.0016 

.0017 

.0018 

.0018 

.0019 

- 2.90 

- 2.80 

.0019 

.0020 

.0021 

.0021 

.0022 

.0023 

.0023 

.0024 

.0025 

.0026 

- 2.80 

- 2.70 

.0026 

.0027 

.0028 

.0029 

.0030 

.0031 

.0032 

.0033 

.0034 

.0035 

- 2.70 

- 2.60 

.0036 

.0037 

.0038 

.0039 

.0040 

.0041 

.0043 

.0044 

.0045 

.0047 

- 2.60 

- 2.50 

.0048 

.0049 

.0051 

.0052 

.0054 

.0055 

.0057 

.0059 

.0060 

.0062 

- 2.50 

- 2.40 

.0064 

.0066 

.0068 

.0069 

.0071 

.0073 

.0075 

.0078 

.0080 

.0082 

- 2.40 

- 2.30 

.0084 

.0087 

.0089 

.0091 

.0094 

.0096 

.0099 

.0102 

.0104 

.0107 

- 2.30 

- 2.20 

.0110 

.0113 

.0116 

.0119 

.0122 

.0125 

.0129 

.0132 

.0136 

.0139 

- 2.20 

- 2.10 

.0143 

.0146 

.0150 

.0154 

.0158 

.0162 

.0166 

.0170 

.0174 

.0179 

- 2.10 

- 2.00 

.0183 

.0188 

.0192 

.0197 

.0202 

.0207 

.0212 

.0217 

.0222 

.0228 

- 2.00 

- 1.90 

.0233 

.0239 

.0244 

.0250 

.0256 

.0262 

.0268 

.0274 

.0281 

.0287 

- 1.90 

- 1.80 

.0294 

.0301 

.0307 

.0314 

.0322 

. 0^29 

.0336 

.0344 

.0351 

.0359 

- 1.80 

- 1.70 

.0367 

.0375 

.0384 

.0392 

.0401 

.0409 

.0418 

.0427 

.0436 

.0446 

- 1.70 

- 1.60 

.0455 

.0465 

.0475 

.0485 

.0495 

.0505 

.0516 

.0526 

.0537 

.0548 

- 1.60 

- 1.50 

.0559 

.0571 

.0582 

.0594 

.0606 

.0618 

.0630 

.0643 

.0655 

.0668 

- 1.50 

- 1.40 

.0681 

.0694 

.0708 

.0721 

.0735 

.0749 

.0764 

.0778 

.0793 

.0808 

- 1.40 

- 1.30 

.0823 

.0838 

.0853 

.0869 

.0885 

.0901 

.0918 

.0934 

.0951 

.0968 

- 1.30 

- 1.20 

.0985 

.1003 

.1020 

.1038 

.1056 

.1075 

.1093 

.1112 

.1131 

.1151 

- 1.20 

- 1.10 

.1170 

.1190 

.1210 

.1230 

.1251 

.1271 

.1292 

.1314 

.1335 

.1357 

- 1.10 

- 1.00 

.1379 

.1401 

.1423 

.1446 

.1469 

.1492 

.1515 

.1539 

.1562 

.1587 

- 1.00 

- 0.90 

.1611 

.1635 

.1660 

.1685 

.1711 

.1736 

.1762 

.1788 

.1814 

.1841 

- 0.90 

- 0.80 

.1867 

.1894 

.1922 

.1949 

.1977 

.2005 

.2033 

.2061 

.2090 

.2119 

- 0.80 

- 0.70 

.2148 

.2177 

.2206 

.2236 

.2266 

.2296 

.2327 

.2358 

.2389 

.2420 

- 0.70 

- 0.60 

.2451 

.2483 

.2514 

.2546 

.2578 

.2611 

.2643 

.2676 

.2709 

.2743 

- 0.60 

- 0.50 

.2776 

.2810 

.2843 

.2877 

.2912 

.2946 

.2981 

.3015 

.3050 

.3085 

- 0.50 

- 0.40 

.3121 

.3156 

.3192 

.3228 

.3264 

.3300 

.3336 

.3372 

.3409 

.3446 

- 0.40 

- 0.30 

.3483 

.3520 

.3557 

.3594 

.3632 

.3669 

.3707 

.3745 

.3783 

.3821 

- 0.30 

- 0.20 

.3859 

.3897 

.3936 

.3974 

.4013 

.4052 

.4090 

.4129 

.4168 

.4207 

- 0.20 

- 0.10 

.4247 

.4286 

.4325 

.4364 

.4404 

.4443 

.4483 

.4522 

.4562 

.4602 

- 0.10 

0.00 

.4641 

.4681 

.4721 

.4761 

.4801 

.4840 

.4880 

.4920 

.4960 

.5000 

0.00 




TABLE C 

( continued ) 


z 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

z 

0.00 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

0.00 

0.10 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

0.10 

0.20 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

0.20 

0.30 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

. 640(3 

.6443 

.6480 

.6517 

0.30 

0.40 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

0.40 

0.50 

.6915 

.6950 

.6985 

.7019 

.7054 

,7088 

.7123 

.7157 

(? 190 > 

.7224 

0.50 

0.60 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 


.7549 

0.60 

0.70 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

0.70 

0.80 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

0.80 

0.90 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

0.90 

1.00 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.00 

1.10 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.10 

1.20 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.20 

1.30 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.30 

1.40 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.40 

1.50 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.50 

1.60 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.60 

1.70 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.70 

1.80 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.80 

1.90 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

1.90 

2.00 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.00 

2.10 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.10 

2.20 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.20 

2.30 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.30 

2.40 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.40 

2.50 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.50 

2.60 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.60 

2.70 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.70 

2.80 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.80 

2.90 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

2.90 

3.00 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.00 

3.10 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.10 

3.20 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.20 

3.30 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.30 

3.40 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 

3.40 

3.50 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

3.50 

3.60 

.9998 

.9998 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

3.60 

3.70 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

3.70 

3.80 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

.9999 

3.80 





TABLE D 

85967 

73152 

14511 

85285 

36009 

95892 

36962 

67835 

63314 

50162 


07483 

51453 

11649 

86348 

76431 

81594 

95848 

36738 

25014 

15460 

Random diQits 

96283 

01898 

61414 

83525 

04231 

13604 

75339 

11730 

85423 

60698 


49174 

12074 

98551 

37895 

93547 

24769 

09404 

76548 

05393 

96770 


97366 

39941 

21225 

93629 

19574 

71565 

33413 

56087 

40875 

13351 


90474 

41469 

16812 

81542 

81652 

45554 

27931 

93994 

22375 

00953 


28599 

64109 

09497 

76235 

41383 

31555 

12639 

00619 

22909 

29563 


25254 

16210 

89717 

65997 

82667 

74624 

36348 

44018 

64732 

93589 


28785 

02760 

24359 

99410 

77319 

73408 

58993 

61098 

04393 

48245 


84725 

86576 

86944 

93296 

10081 

82454 

76810 

52975 

10324 

15457 


41059 

66456 

47679 

66810 

15941 

84602 

14493 

65515 

19251 

41642 


67434 

41045 

82830 

47617 

36932 

46728 

71183 

36345 

41404 

81110 


72766 

68816 

37643 

19959 

57550 

49620 

98480 

25640 

67257 

18671 


92079 

46784 

66125 

94932 

64451 

29275 

57669 

66658 

30818 

58353 


29187 

40350 

62533 

73603 

34075 

16451 

42885 

03448 

37390 

96328 


74220 

17612 

65522 

80607 

19184 

64164 

66962 

82310 

18163 

63495 


03786 

02407 

06098 

92917 

40434 

60602 

82175 

04470 

78754 

90775 


75085 

55558 

15520 

27038 

25471 

76107 

90832 

10819 

56797 

33751 


09161 

33015 

19155 

11715 

00551 

24909 

31894 

37774 

37953 

78837 


75707 

48992 

64998 

87080 

39333 

00.767 

45637 

12538 

67439 

94914 


21333 

48660 

31288 

00086 

79889 

75532 

28704 

62844 

92337 

99695 

65626 

50061 

42539 

14812 

48895 

11196 

34335 

60492 

70650 

51108 

84380 

07389 

87891 

76255 

89604 

41372 

10837 

66992 

93183 

56920 

46479 

32072 

80083 

63868 

70930 

89654 

05359 

47196 

12452 

38234 

59847 

97197 

55147 

76639 

76971 

55928 

36441 

95141 

42333 

67483 


31416 

11231 

27904 

57383 

31852 

69137 

96667 

14315 

01007 

31929 

82066 

83436 

67914 

21465 

99605 

83114 

97885 

74440 

99622 

87912 

01850 

42782 

39202 

18582 

46214 

99228 

79541 

78298 

75404 

63648 

32315 

89276 

89582 

87138 

16165 

15984 

21466 

63830 

30475 

74729 

59388 

42703 

55198 

80380 

67067 

97155 

34160 

85019 

03527 

78140 

58089 

27632 

50987 

91373 

07736 

20436 

96130 

73483 

85332 

24384 

61705 

57285 

30392 

23660 

75841 

21931 

04295 

00875 

09114 

32101 

18914 

98982 

60199 

99275 

41967 

35208 

30357 

76772 

92656 

62318 

11965 

94089 

34803 

48941 

69709 

16784 

44642 

89761 

66864 

62803 

85251 

48111 

80936 

81781 

93248 

67877 

16498 

31924 

51315 

79921 


66121 

96986 

84844 

93873 

46352 

92183 

51152 

85878 

30490 

15974 

53972 

96642 

24199 

58080 

35450 

03482 

66953 

49521 

63719 

57615 

14509 

16594 

78883 

43222 

23093 

58645 

60257 

89250 

63266 

90858 

37700 

07688 

65533 

72126 

23611 

93993 

01848 

03910 

38552 

17472 

85466 

59392 

72722 

15473 

73295 

49759 

56157 

60477 

83284 

56367 

52969 

55863 

42312 

67842 

05673 

91878 

82738 

36563 

79540 

61935 

42744 

68315 

17514 

02878 

97291 

74851 

42725 

57894 

81434 

62041 

26140 

13336 

67726 

61876 

29971 

99294 

96664 

52817 

90039 

53211 

95589 

56319 

14563 

24071 

06916 

59555 

18195 

32280 

79357 

04224 

39113 

13217 

59999 

49952 

83021 

47709 

53105 

19295 

88318 

41626 

41392 

17622 

18994 

98283 

07249 

52289 

24209 

91139 

30715 

06604 

54684 

53645 

79246 

70183 

87731 

19185 

08541 

33519 

07223 

97413 

89442 

61001 

36658 

57444 

95388 

36682 

38052 

46719 

09428 

94012 

36751 

16778 

54888 

15357 

68003 

43564 

90976 

58904 

40512 

07725 

98159 

02564 

21416 

74944 

53049 

88749 

02865 

25772 

89853 

88714 


626 



TABLE E 

Percentiles of the 
t distribution 
P(t ^ t 0 ) 


P ( f 10 < 2 . 2281 ) = 0.975 


df 

* 0.90 

* 0.95 

* 0.975 

* 0.99 

* 0.995 

1 

3.078 

6.3138 

12.706 

31.821 

63.657 

2 

1.886 

2.9200 

4.3027 

6.965 

9.9248 

3 

1.638 

2.3534 

3.1825 

4.541 

5.8409 

4 

1.533 

2.1318 

2.7764 

3.747 

4.6041 

5 

1.476 

2.0150 

2.5706 

3.365 

4.0321 

6 

1.440 

1.9432 

2.4469 

3.143 

3.7074 

7 

1.415 

1.8946 

2.3646 

2.998 

3.4995 

8 

1.397 

1.8595 

2.3060 

2.896 

3.3554 

9 

1.383 

1.8331 

2.2622 

2.821 

3.2498 

10 

1.372 

1.8125 

2.2281 

2.764 

3.1693 

11 

1.363 

1.7959 

2.2010 

2.718 

3.1058 

12 

1.356 

1.7823 

2.1788 

2.681 

3.0545 

13 

1.350 

1.7709 

2.1604 

2.650 

3.0123 

14 

1.345 

1.7613 

2.1448 

2.624 

2.9768 

15 

1.341 

1.7530 

2.1315 

2.602 

2.9467 

16 

1.337 

1.7459 

2.1199 

2.583 

2.9208 

17 

1.333 

1.7396 

2.1098 

2.567 

2.8982 

18 

1.330 

1.7341 

2.1009 

2.552 

2.8784 

19 

1.328 

1.7291 

2.0930 

2.539 

2.8609 

20 

1.325 

1.7247 

2.0860 

2.528 

2.8453 

21 

1.323 

1.7207 

2.0796 

2.518 

2.8314 

22 

1.321 

1.7171 

2.0739 

2.508 

2.8188 

23 

1.319 

1.7139 

2.0687 

2.500 

2.8073 

24 

1.318 

1.7109 

2.0639 

2.492 

2.7969 

25 

1.316 

1.7081 

2.0595 

2.485 

2.7874 

26 

1.315 

1.7056 

2.0555 

2.479 

2.7787 

27 

1.314 

1.7033 

2.0518 

2.473 

2.7707 

28 

1.313 

1.7011 

2.0484 

2.467 

2.7633 

29 

1.311 

1.6991 

2.0452 

2.462 

2.7564 

30 

1 310 

1.6973 

2.0423 

2.457 

2.7500 

35 

1.3062 

1.6896 

2.0301 

2.438 

2.7239 

40 

1.3031 

1.6839 

2.0211 

2.423 

2.7045 

45 

1.3007 

1.6794 

2.0141 

2.412 

2.6896 

50 

1.2987 

1.6759 

2.0086 

2.403 

2.6778 

60 

1.2959 

1.6707 

2.0003 

2.390 

2.6603 

70 

1.2938 

1.6669 

1.9945 

2.381 

2.6480 

80 

1.2922 

1.6641 

1.9901 

2.374 

2.6388 

90 

1.2910 

1.6620 

1.9867 

2.368 

2.6316 

100 

1.2901 

1.6602 

1.9840 

2.364 

2.6260 

120 

1.2887 

1.6577 

1.9799 

2.358 

2.6175 

140 

1.2876 

1.6558 

1.9771 

2.353 

2.6114 

160 

1.2869 

1.6545 

1.9749 

2.350 

2.6070 

180 

1.2863 

1.6534 

1.9733 

2.347 

2.6035 

200 

1.2858 

1.6525 

1.9719 

2.345 

2.6006 

00 

1.282 

1.645 

1.96 

2.326 

2.576 


627 



TABLE F 
Percentiles of 
the chi-square 
distribution 
P(X 2 < x 2 ) 



df 

v 2 

/ 0.005 

V 2 

* 0.025 

V 2 

*0 05 

V 2 

*0 90 

V 2 

/ 0.95 

V 2 

* 0.975 

V 2 

* 0.99 

* 0.995 

1 

0.0000393 

0.000982 

0.00393 

2.706 

3.841 

5.024 

6.635 

7.879 

2 

0.0100 

0.0506 

0.103 

4.605 

5.991 

7.378 

9.210 

10.597 

3 

0.0717 

0.216 

0.352 

6.251 

7.815 

9.348 

11.345 

12.838 

4 

0.207 

0.484 

0.711 

7.779 

9.488 

11.143 

13.277 

14.860 

5 

0.412 

0.831 

1.145 

9.236 

11.070 

12.832 

15.086 

16.750 

6 

0.676 

1.237 

1.635 

10.645 

12.592 

14.449 

16.812 

18.548 

7 

0.989 

1.690 

2.167 

12.017 

14.067 

16.013 

18.475 

20.278 

8 

1.344 

2.180 

2.733 

13.362 

15.507 

17.535 

20.090 

21.955 

9 

1.735 

2.700 

3.325 

14.684 

16.919 

19.023 

21.666 

23.589 

10 

2.156 

3.247 

3.940 

15.987 

18.307 

20.483 

23.209 

25.188 

11 

2.603 

3.816 

4.575 

17.275 

19.675 

21.920 

24.725 

26.757 

12 

3.074 

4.404 

5.226 

18.549 

21.026 

23.336 

26.217 

28.300 

13 

3.565 

5.009 

5.892 

19.812 

22.362 

24.736 

27.688 

29.819 

14 

4.075 

5.629 

6.571 

21.064 

23.685 

26.119 

29.141 

31.319 

15 

4.601 

6.262 

7.261 

22.307 

24.996 

27.488 

30.578 

32.801 

16 

5.142 

6.908 

7.962 

23.542 

26.296 

28.845 

32.000 

34.267 

17 

5.697 

7.564 

8.672 

24.769 

27.587 

30.191 

33.409 

35.718 

18 

6.265 

8.231 

9.390 

25.989 

28.869 

31.526 

34.805 

37.156 

19 

6.844 

8.907 

10.117 

27.204 

30.144 

32.852 

36.191 

38.582 

20 

7.434 

9.591 

10.851 

28.412 

31.410 

34.170 

37.566 

39.997 

21 

8.034 

10.283 

11.591 

29.615 

32.671 

35.479 

38.932 

41.401 

22 

8.643 

10.982 

12.338 

30.813 

33.924 

36.781 

40.289 

42.796 

23 

9.260 

11.688 

13.091 

32.007 

35.172 

38.076 

41.638 

44.181 

24 

9.886 

12.401 

13.848 

33.196 

36.415 

39.364 

42.980 

45.558 

25 

10.520 

13.120 

14.611 

34.382 

37.652 

40.646 

44.314 

46.928 

26 

11.160 

13.844 

15.379 

35.563 

38.885 

41.923 

45.642 

48.290 

27 

11.808 

14.573 

16.151 

36.741 

40.113 

43.194 

46.963 

49.645 

28 

12.461 

15.308 

16.928 

37.916 

41.337 

44.461 

48.278 

50.993 

29 

13.121 

1 6.047 

17.708 

39.087 

42.557 

45.722 

49.588 

52.336 

30 

13.787 

16.791 

1 8.493 

40.256 

43.773 

46.979 

50.892 

53.672 

35 

17.192 

20.569 

22.465 

46.059 

49.802 

53.203 

57.342 

60.275 

40 

20.707 

24.433 

26.509 

51.805 

55.758 

59.342 

63.691 

66.766 

45 

24.311 

28.366 

30.612 

57.505 

61.656 

65.410 

69.957 

73.166 

50 

27.991 

32.357 

34.764 

63.167 

67.505 

71.420 

76.154 

79.490 

60 

35.535 

40.482 

43.188 

74.397 

79.082 

83.298 

88.379 

91.952 

70 

43.275 

48.758 

51.739 

85.527 

90.531 

95.023 

100.425 

104.215 

80 

51.172 

57.153 

60.391 

96.578 

101.879 

106.629 

112.329 

116.321 

90 

59.196 

65.647 

69.126 

107.565 

113.145 

118.136 

124.116 

128.299 

100 

67.328 

74.222 

77.929 

118.498 

124.342 

129.561 

135.807 

140.169 






TABLE G 

Percentiles of the 
F distribution 
P(F < F 0 ) 



^ 0.90 


Denominator 
degrees of 
Freedom 

1 

2 

3 

Numerator degrees of freedom 

4 5 6 

7 

8 

9 

1 

39.86 

49.50 

53.59 

55.83 

57.24 

58.20 

58 91 

59.44 

59.86 

2 

8.53 

9.00 

9.16 

9.24 

9 29 

9.33 

9.35 

9 37 

9.38 

3 

5.54 

5.46 

5.39 

5.34 

5.31 

5.28 

5.27 

5.25 

5.24 

4 

4.54 

432 

4.19 

4.11 

405 

4.01 

3.98 

3.95 

3.94 

5 

4.06 

3.78 

3.62 

3.52 

3.45 

3.40 

3.37 

3.34 

3.32 

6 

3.78 

3.46 

3.29 

3.18 

3.11 

3.05 

3.01 

2.98 

2.96 

7 

3.59 

3.26 

3.07 

2.96 

2.88 

2.83 

2.78 

2.75 

2.72 

8 

3.46 

3.11 

292 

2.81 

2.73 

2.67 

2.62 

2.59 

2.56 

9 

3.36 

3.01 

2.81 

2.69 

2.61 

2.55 

2.51 

2.47 

2.44 

10 

3.29 

2.92 

2.73 

2.61 

2 . 52 . 

2 46 

2.41 

2.38 

2.35 

11 

3.23 

2.86 

2.66 

2.54 

2.45 

2.39 

2.34 

2.30 

2.27 

12 

3.18 

2.81 

2.61 

2 48 

2.39 

2 33 

2.28 

2.24 

2.21 

13 

3.14 

2 76 

2.56 

2.43 

2.35 

2.28 

2.23 

2.20 

2.16 

14 

3.10 

2.73 

2.52 

2 39 

2.31 

2.24 

2 19 

2.15 

2.12 

15 

3.07 

2.70 

2.49 

2 36 

2.27 

2.21 

2.16 

2.12 

2.09 

16 

3.05 

2.67 

2.46 

2.33 

2.24 

2.18 

2.13 

2.09 

2.06 

17 

3.03 

2.64 

244 

2.31 

2.22 

2.15 

2.10 

2.06 

2.03 

18 

3.01 

2 62 

2.42 

2.29 

2.20 

2.13 

208 

2.04 

2.00 

19 

2.99 

2.61 

2.40 

2.27 

2.18 

2.11 

2.06 

2.02 

1.98 

20 

2.97 

2.59 

2.38 

2.25 

2.16 

2.09 

2.04 

2.00 

1.96 

21 

2.96 

2.57 

2.36 

2.23 

2.14 

2.08 

2.02 

1.98 

1.95 

22 

2.95 

2.56 

2 35 

2.22 

2.13 

2.06 

2.01 

1.97 

1.93 

23 

2.94 

2.55 

2.34 

2.21 

2.11 

2.05 

1.99 

1.95 

1.92 

24 

2.93 

2.54 

2.33 

2.19 

2.10 

2.04 

1.98 

1.94 

1.91 

25 

292 

2.53 

2.32 

2 18 

2.09 

202 

1.97 

1.93 

1.89 

26 

2.91 

2.52 

2.31 

2.17 

2.08 

2.01 

1.96 

1.92 

1.88 

27 

2.90 

2.51 

2.30 

2.17 

2.07 

2.00 

1.95 

1 91 

1.87 

28 

2.89 

2.50 

2.29 

2.16 

206 

2.00 

1.94 

1.90 

1.87 

29 

2.89 

2.50 

2.28 

2.15 

206 

1.99 

1.93 

1.89 

1.86 

30 

2.88 

2.49 

2.28 

2.14 

2.05 

1 98 

1.93 

1.88 

1.85 

40 

2.84 

2.44 

2.23 

2.09 

2.00 

1.93 

1.87 

1.83 

1.79 

60 

2.79 

2.39 

2.18 

2.04 

1.95 

1.87 

1.82 

1.77 

1.74 

120 

2.75 

2 35 

2.13 

1.99 

1.90 

1.82 

1.77 

1.72 

1.68 

00 

2.71 

2.30 

2.08 

1.94 

1.85 

1.77 

1.72 

1.67 

1.63 



TABLE G 

( continued ) 


^ 0.90 


Denominator 

degrees of Numerator degrees of freedom 


freedom 

10 

12 

15 

20 

24 

30 

40 

60 

120 

CO 

1 

60.19 

60.71 

61.22 

61.74 

62.00 

62.26 

62.53 

62 79 

63.06 

63.33 

2 

9.39 

9.41 

9 42 

9.44 

9.45 

9.46 

9.47 

9.47 

9.48 

9.49 

3 

5.23 

5.22 

5.20 

5.18 

5.18 

5.17 

5.16 

5.15 

5.14 

5.13 

4 

3.92 

3.90 

3.87 

3.84 

3.83 

3.82 

3.80 

3.79 

3.78 

3.76 

5 

3.30 

3.27 

3.24 

3.21 

3 19 

3 17 

3.16 

3.14 

3.12 

3.10 

6 

2.94 

2.90 

2.87 

2.84 

2 82 

280 

2.78 

2.76 

2.74 

2.72 

7 

2.70 

2.67 

2.63 

2.59 

2.58 

2.56 

2.54 

2.51 

2.49 

2.47 

8 

2.54 

2.50 

2.46 

2.42 

2.40 

2 38 

2.36 

2.34 

2.32 

2.29 

9 

2.42 

2 38 

2.34 

2.30 

2 28 

2.25 

2.23 

2.21 

2.18 

2.16 

10 

2.32 

2.28 

2 24 

2.20 

2 18 

2 16 

2.13 

2.11 

2.08 

2.06 

11 

2.25 

2.21 

2.17 

2 12 

2.10 

2.08 

2.05 

2.03 

2.00 

1.97 

12 

2.19 

2.15 

2.10 

2.06 

2.04 

2.01 

1.99 

1.96 

1.93 

1.90 

13 

2.14 

2.10 

205 

2.01 

1.98 

1.96 

1.93 

1.90 

1.88 

1.85 

14 

2.10 

2.05 

2.01 

1.96 

1.94 

1.91 

1.89 

1.86 

1.83 

1 80 

15 

2 06 

2.02 

1.97 

1.92 

1.90 

1.87 

1.85 

1.82 

1.79 

1.76 

16 

2.03 

1.99 

1 94 

1 89 

1.87 

1.84 

1.81 

1.78 

1.75 

1.72 

17 

2.00 

1.96 

1.91 

1.86 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

18 

1.98 

1.93 

1 89 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

1.66 

19 

1.96 

1.91 

1.86 

1.81 

1.79 

1.76 

1.73 

1.70 

1.67 

1.63 

20 

1.94 

1.89 

1.84 

1.79 

1.77 

1.74 

1.71 

1.68 

1.64 

1.61 

21 

1.92 

1.87 

1.83 

1.78 

1.75 

1.72 

1.69 

1.66 

1.62 

1 59 

22 

1.90 

1.86 

1.81 

1.76 

1.73 

1.70 

1.67 

1.64 

1.60 

1.57 

23 

1.89 

1.84 

1.80 

1.74 

1.72 

1.69 

1 66 

1.62 

1.59 

1.55 

24 

1.88 

1.83 

1.78 

1.73 

1.70 

1.67 

1.64 

1.61 

1.57 

1.53 

25 

1.87 

1 82 

1.77 

1.72 

1.69 

1.66 

1.63 

1.59 

1.56 

1.52 

26 

1.86 

1.81 

1.76 

1.71 

1.68 

1.65 

1.61 

1.58 

1.54 

1.50 

27 

1.85 

1.80 

1.75 

1.70 

1.67 

1.64 

1.60 

1.57 

1.53 

1.49 

28 

1.84 

1.79 

1.74 

1.69 

1.66 

1.63 

1.59 

1.56 

1.52 

1.48 

29 

1.83 

1.78 

1.73 

1 68 

1.65 

1.62 

1.58 

1.55 

1.51 

1.47 

30 

1.82 

1.77 

1.72 

1.67 

1.64 

1.61 

1.57 

1.54 

1.50 

1.46 

40 

1.76 

1.71 

1.66 

1.61 

1.57 

1.54 

1.51 

1.47 

1.42 

1.38 

60 

1.71 

1.66 

1 60 

1.54 

1.51 

1.48 

1.44 

1.40 

1.35 

1 29 

120 

1.65 

1.60 

1.55 

1.48 

1.45 

1.41 

1.37 

1.32 

1.26 

1.19 

00 

1.60 

1.55 

1.49 

1.42 

1.38 

1.34 

1.30 

1.24 

1.17 

1.00 


TABLE G 

(continued) 


Denominator 
degrees of 
freedom 

1 

2 

3 

Numerator degrees of freedom 
4 5 6 

7 

8 

9 

1 

161.4 

199.5 

215.7 

224.6 

230.2 

234.0 

236.8 

238.9 

240.5 

2 

18.51 

19.00 

19.16 

19.25 

19.30 

19.33 

19.35 

19.37 

19.38 

3 

10.13 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

8.85 

8.81 

4 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

6.00 

5 

6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.88 

4.82 

4.77 

6 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.21 

4.15 

4.10 

7 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.79 

3.73 

3.68 

8 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.50 

3.44 

3.39 

9 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.29 

3.23 

3.18 

10 

4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.14 

3.07 

3.02 

11 

4.84 

3.98 

359 

3.36 

3.20 1 

3.09 

3.01 

2.95 

2.90 

12 

4.75 

3.89 

3.49 

3.26 

3.11 

3.00 

2.91 

2.85 

2.80 

13 

4.67 

3.81 

3.41 

3.18 

3.03 

2.92 

2.83 

2.77 

2.71 

14 

4.60 

3.74 

3.34 

3.11 

2.96 

2.85 

2.76 

2.70 

2.65 

15 

4.54 

3.68 

3.29 

3.06 

2.90 

2.79 

2.71 

2.64 

2.59 

16 

4.49 

3.63 

3.24 

3.01 

2.85 

2.74 

2.66 

2.59 

2.54 

17 

4.45 

3.59 

3.20 

2.96 

2.81 

2.70 

2.61 

2.55 

2.49 

18 

4.41 

3.55 

3.16 

2.93 

2.77 

2.66 

2.58 

2.51 

2.46 

19 

4.38 

3.52 

3.13 

2.90 

2.74 

2.63 

2.54 

2.48 

2.42 

20 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.51 

2.45 

2.39 

21 

4.32 

3.47 

3.07 

2.84 

2.68 

2.57 

2.49 

2.42 

2.37 

22 

4.30 

3.44 

3.05 

2.82 

2.66 

2.55 

2.46 

2.40 

2.34 

23 

4.28 

3.42 

3.03 

2.80 

2.64 

2.53 

2.44 

2.37 

2.32 

24 

4.26 

3.40 

3.01 

2.78 

2.62 

2.51 

2.42 

2.36 

2.30 

25 

4.24 

3.39 

2.99 

2.76 

2.60 

2.49 

2.40 

2.34 

2.28 

26 

4.23 

3.37 

2.98 

2.74 

2.59 

2.47 

2.39 

2.32 

2.27 

27 

4.21 

3.35 

2.96 

2.73 

2.57 

2.46 

2.37 

2.31 

2.25 

28 

4.20 

3.34 

2.95 

2.71 

2.56 

2.45 

2.36 

2.2.9 

2.24 

29 

4.18 

3.33 

2.93 

2.70 

2.55 

2.43 

2.35 

2.28 

2.22 

30 

4.17 

3.32 

2.92 

2.69 

2.53 

2.42 

2.33 

2.27 

2.21 

40 

4.08 

3.23 

2.84 

2.61 

2.45 

2.34 

2.25 

2.18 

2.1.2 

60 

4.00 

3.15 

2.76 

253 

2.37 

2.25 

2.17 

2.10 

2.04 

120 

3.92 

3.07 

2.68 

2.45 

2.29 

2.17 

2.09 

2.02 

1.96 

00 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

2.01 

1.94 

1.88 





TABLE G 

{continued) 


Denominator 

degrees of Numerator degrees of freedom 


freedom 

10 

12 

15 

20 

24 

30 

40 

60 

120 

00 

1 

241.9 

243.9 

245.9 

248.0 

249.1 

250.1 

251.1 

252.2 

253.3 

254.3 

2 

19.40 

19.41 

19.43 

19.45 

19.45 

19.46 

19.47 

19.48 

19.49 

19.50 

3 

8.79 

8.74 

8.70 

8.66 

8.64 

8.62 

8.59 

8.57 

8.55 

8.53 

4 

5.96 

5.91 

5.86 

5.80 

5.77 

5.75 

5.72 

5.69 

5.66 

5.63 

5 

4.74 

4.68 

4.62 

4.56 

4.53 

4.50 

4.46 

4.43 

4.40 

4.36 

6 

4.06 

4.00 

3.94 

3.87 

3.84 

3.81 

3.77 

3.74 

3.70 

3.67 

7 

3.64 

3.57 

3.51 

3.44 

3.41 

3.38 

3.34 

3.30 

3.27 

3.23 

8 

3.35 

3.28 

3.22 

3.15 

3.12 

3.08 

3.04 

3.01 

2.97 

2.93 

9 

3.14 

3.07 

3.01 

2.94 

2.90 

2.86 

2.83 

2.79 

2.75 

2.71 

10 

2.98 

2.91 

2.85 

2.77 

2.74 

2.70 

2.66 

2.62 

2.58 

2.54 

11 

2.85 

2.79 

2.72 

2.65 

2.61 

2.57 

2.53 

2.49 

2.45 

2.40 

12 

2.75 

2.69 

2.62 

2.54 

2.51 

2.47 

2.43 

2.38 

2.34 

2.30 

13 

2.67 

2.60 

2.53 

2.46 

2.42 

2.38 

2.34 

2.30 

2.25 

2.21 

14 

2.60 

2.53 

2.46 

2.39 

2.35 

2.31 

2.27 

2.22 

2.18 

2.13 

15 

2.54 

2.48 

2.40 

2.33 

2.29 

2.25 

2.20 

2.16 

2.11 

2.07 

16 

2.49 

2.42 

2.35 

2.28 

2.24 

2.19 

2.15 

2.11 

2.06 

2.01 

17 

2.45 

2.38 

2.31 

2.23 

2.19 

2.15 

2.10 

2.06 

2.01 

1.96 

18 

2.41 

2.34 

2.27 

2.19 

2.15 

2.11 

2.06 

2.02 

1.97 

1.92 

19 

2.38 

2.31 

2.23 

2.16 

2.11 

2.07 

2.03 

1.98 

1.93 

1.88 

20 

2.35 

2.28 

2.20 

2.12 

2.08 

2.04 

1.99 

1.95 

1.90 

1.84 

21 

2.32 

2.25 

2.18 

2.10 

2.05 

2.01 

1.96 

1.92 

1.87 

1.81 

22 

2.30 

2.23 

2.15 

2.07 

2.03 

1.98 

1.94 

1.89 

1.84 

1.78 

23 

2.27 

2.20 

2.13 

2.05 

2.01 

1.96 

1.91 

1.86 

1.81 

1.76 

24 

2.25 

2.18 

2.11 

2.03 

1.98 

1.94 

1.89 

1.84 

1.79 

1.73 

25 

2.24 

2.16 

2.09 

2.01 

1.96 

1.92 

1.87 

1.82 

1.77 

1.71 

26 

2.22 

2.15 

2.07 

1.99 

1.95 

1.90 

1.85 

1.80 

1.75 

1.69 

27 

2.20 

2.13 

2.06 

1.97 

1.93 

1.88 

1.84 

1.79 

1.73 

1.67 

28 

2.19 

2.12 

2.04 

1.96 

1.91 

1.87 

1.82 

1.77 

1.71 

1.65 

29 

2.18 

2.10 

2.03 

1.94 

1.90 

1.85 

1.81 

1.75 

1.70 

1.64 

30 

2.16 

2.09 

2.01 

1.93 

1.89 

1.84 

1.79 

1.74 

1.68 

1.62 

40 

2.08 

2.00 

1.92 

1.84 

1.79 

1.74 

1.69 

1.64 

1.58 

1.51 

60 

1.99 

1.92 

1.84 

1.75 

1.70 

1.65 

1.59 

1.53 

1.47 

1.39 

120 

1.91 

1.83 

1.75 

1.66 

1.61 

1.55 

1.50 

1.43 

1.35 

1.25 

CO 

1.83 

1.75 

1.67 

1.57 

1.52 

1.46 

1.39 

1.32 

1.22 

1.00 



TABLE G 

( continued) 


^ 0.975 


Denominator 
degrees of 
freedom 

1 

2 

Numerator degrees of freedom 

3 4 5 6 

7 

8 

9 

1 

647.8 

799.5 

864.2 

899.6 

921.8 

937.1 

948.2 

956.7 

963.3 

2 

38.51 

39.00 

39.17 

39.25 

39.30 

39.33 

39.36 

39.37 

39.39 

3 

17.44 

16.04 

15.44 

15.10 

14.88 

14.73 

14.62 

14.54 

14.47 

4 

12.22 

10.65 

9.98 

9.60 

9.36 

9.20 

9.07 

8.98 

8.90 

5 

10.01 

8.43 

7.76 

7.39 

7.15 

6.98 

6.85 

6.76 

6.68 

6 

8.81 

7.26 

6.60 

6.23 

5.99 

5.82 

5.70 

5.60 

5.52 

7 

8.07 

6.54 

5.89 

5.52 

5.29 

5.12 

4.99 

4.90 

4.82 

8 

7.57 

6.06 

5.42 

5.05 

4.82 

4.65 

4.53 

4.43 

4.36 

9 

7.21 

5.71 

5.08 

4.72 

4.48 

4.32 

4.20 

4.10 

4.03 

10 

6.94 

5.46 

4.83 

4.47 

4.24 

4.07 

3.95 

3.85 

3.78 

11 

6.72 

5.26 

4.63 

4.28 

4.04 

3.88 

3.76 

3.66 

3.59 

12 

6.55 

5.10 

4.47 

4.12 

3.89 

3.73 

3.61 

3.51 

3.44 

13 

6.41 

4.97 

4.35 

4.00 

3.77 

3.60 

3.48 

3.39 

3.31 

14 

6.30 

4.86 

4.24 

3.89 

3.66 

3.50 

3.38 

3.29 

3.21 

15 

6.20 

4.77 

4.15 

3.80 

3.58 

3.41 

3.29 

3.20 

3.12 

16 

6.12 

4.69 

4.08 

3.73 

3.50 

3.34 

3.22 

3.12 

3.05 

17 

6.04 

4.62 

4.01 

3.66 

3.44 

3.28 

3.16 

3.06 

2.98 

18 

5.98 

4.56 

3.95 

3.61 

3.38 

3.22 

3.10 

3.01 

2.93 

19 

5.92 

4.51 

3.90 

3.56 

3.33 

3.17 

3.05 

2.96 

2.88 

20 

5.87 

446 

3.86 

3.51 

3.29 

3.13 

3.01 

2.91 

2.84 

21 

5.83 

4.42 

3.82 

3.48 

3.25 

3.09 

2.97 

2.87 

2.80 

22 

5.79 

4.38 

3.78 

3.44 

3.22 

3.05 

2.93 

2.84 

2.76 

23 

5.75 

4.35 

3.75 

3.41 

3.18 

3.02 

2.90 

2.81 

2.73 

24 

5.72 

4.32 

3.72 

3 38 

3.15 

2.99 

2.87 

2.78 

2.70 

25 

5.69 

4.29 

3.69 

3.35 

3.13 

2.97 

2.85 

2.75 

2.68 

26 

5.66 

4.27 

3.67 

3.33 

3.10 

2.94 

2.82 

2.73 

2.65 

27 

5.63 

4.24 

3.65 

3.31 

3.08 

2.92 

2.80 

2.71 

2.63 

28 

5.61 

4.22 

3.63 

3.29 

3.06 

2.90 

2.78 

2.69 

2.61 

29 

5.59 

4 20 

3.61 

3.27 

3.04 

2.88 

2.76 

2.67 

2.59 

30 

5.57 

4.18 

3.59 

3.25 

3.03 

2.87 

2.75 

2.65 

2.57 

40 

5.42 

4.05 

3.46 

3.13 

2.90 

2.74 

2.62 

2.53 

2.45 

60 

5.29 

3 93 

3.34 

3.01 

2.79 

2.63 

2.51 

2.41 

2.33 

120 

5.15 

3.30 

3.23 

2.89 

2.67 

2.52 

2.39 

2.30 

2.22 

a> 

5.02 

3.69 

3.12 

2.79 

2.57 

2.41 

2.29 

2.19 

2.11 



TABLE G ' 

(continued) __!_!!!___ 

Denominator 

degrees of Numerator degrees of freedom 


freedom 

10 

12 

15 

20 

24 

30 

40 

60 

120 

00 

1 

968.6 

976.7 

984.9 

993.1 

997.2 

1001 

1006 

1010 

1014 

1018 

2 

39.40 

39.41 

39.43 

39.45 

39.46 

39.46 

39.47 

39.48 

39.49 

39.50 

3 

14.42 

14.34 

14.25 

14.17 

14.12 

14.08 

14.04 

13.99 

13.95 

13.90 

4 

8.84 

8.75 

8.66 

8.56 

8.51 

8.46 

8.41 

8.36 

8.31 

8.26 

5 

6.62 

6.52 

6.43 

6.33 

6.28 

6.23 

6.18 

6.12 

6.07 

6.02 

6 

5.46 

5.37 

5.27 

5.17 

5.12 

5.07 

5.01 

4.96 

4.90 

4.85 

7 

4.76 

4.67 

4.57 

4.47 

4.42 

4.36 

4.31 

4.25 

4.20 

4.14 

8 

4.30 

4.20 

4.10 

4.00 

3.95 

3.89 

3.84 

3.78 

3.73 

3.67 

9 

3.96 

3.87 

3.77 

3.67 

3.61 

3.56 

3.51 

3.45 

3.39 

3.33 

10 

3.72 

3.62 

3.52 

3.42 

3.37 

3.31 

3.26 

3.20 

3.14 

3.08 

11 

3.53 

3.43 

3.33 

3.23 

3.17 

3.12 

3.06 

3.00 

2.94 

2.88 

12 

3.37 

3.28 

3.18 

3.07 

3.02 

2.96 

2.91 

2.85 

2.79 

2.72 

13 

3.25 

3.15 

3.05 

2.95 

2.89 

2.84 

2.78 

2.72 

2.66 

2.60 

14 

3.15 

3.05 

2.95 

2 84 

2.79 

2.73 

2.67 

2.61 

2.55 

2.49 

15 

3.06 

2.96 

2.86 

2.76 

2.70 

2.64 

2.59 

2.52 

2.46 

2.40 

16 

2 99 

2.89 

2.79 

2.68 

2.63 

2.57 

2.51 

2.45 

2.38 

2.32 

17 

2.92 

2.82 

2.72 

2.62 

2.56 

2.50 

2.44 

2.38 

2.32 

2.25 

18 

2.87 

2.77 

2.67 

2.56 

2.50 

2.44 

2.38 

2.32 

2.26 

2.19 

19 

2.82 

2.72 

2.62 

2.51 

2.45 

2.39 

2.33 

2.27 

2.20 

2.13 

20 

2.77 

2.68 

2.57 

2.46 

2.41 

2.35 

2.29 

2.22 

2.16 

2.09 

21 

2.73 

2.64 

2.53 

2.42 

2.37 

2.31 

2.25 

2.18 

2.11 

2.04 

22 

2.70 

2.60 

2.50 

2.39 

2.33 

2.27 

2.21 

2.14 

2.08 

2.00 

23 

2.67 

2.57 

2.47 

2.36 

2.30 

2.24 

2.18 

2.11 

2.04 

1.97 

24 

2.64 

2.54 

2.44 

2.33 

2.27 

2.21 

2.15 

2.08 

2.01 

1.94 

25 

261 

2.51 

2.41 

2.30 

2.24 

2.18 

2.12 

2.05 

1.98 

1.91 

26 

2.59 

2.49 

2.39 

2.28 

2.22 

2.16 

2.09 

2.03 

1.95 

1.88 

27 

2.57 

2.47 

2.36 

2.25 

2.19 

2.13 

2.07 

2.00 

1.93 

1.85 

28 

2.55 

2.45 

2.34 

2.23 

2.17 

2.11 

2.05 

1.98 

1.91 

1.83 

29 

2.53 

2.43 

2.32 

2.21 

2.15 

2.09 

2.03 

1.96 

1.89 

1.81 

30 

2.51 

2.41 

2.31 

2.20 

2.14 

2.07 

2.01 

1.94 

1.87 

1.79 

40 

2.39 

2.29 

2.18 

2.07 

2.01 

1.94 

1.88 

1.80 

1.72 

1.64 

60 

2.27 

2.17 

2.06 

1.94 

1.88 

1.82 

1.74 

1.67 

1.58 

1.48 

120 

2.16 

2.05 

1.94 

1.82 

1.76 

1.69 

1.61 

1.53 

1.43 

1.31 

co 

2 05 

1 94 

1.83 

1.71 

1.64 

1.57 

1.48 

1.39 

1.27 

1.00 






TABLE G 

( continued ) 


^0 99 


Denominator 
degrees of 
freedom 

1 

2 

3 

Numerator degrees of freedom 
4 5 6 

7 

8 

9 

1 

4052 

4999.5 

5403 

5625 

5764 

5859 

5928 

5981 

6022 

2 

98.50 

99.00 

99.17 

99.25 

99.30 

99.33 

99.36 

99.37 

99.39 

3 

34.12 

30.82 

29.46 

28.71 

28.24 

27.91 

27.67 

27.49 

27.35 

4 

21.20 

18.00 

16.69 

15.98 

15.52 

15.21 

14.98 

14.80 

14.66 

5 

16.26 

13.27 

12.06 

11.39 

10.97 

10.67 

10.46 

10.29 

10.16 

6 

13.75 

10.92 

9.78 

9.15 

8.75 

8.47 

8.26 

8.10 

7.98 

7 

12.25 

9.55 

8.45 

7.85 

7.46 

7.19 

6.99 

6.84 

6.72 

8 

11.26 

8.65 

7.59 

7.01 

6.63 

6.37 

6.18 

6.03 

5.91 

9 

10.56 

8.02 

6.99 

6.42 

6.06 

5.80 

5.61 

5.47 

5.35 

10 

10.04 

7.56 

6.55 

5.99 

5.64 

5.39 

5.20 

5.06 

4.94 

11 

9.65 

7.21 

6.22 

5.67 

5.32 

5.07 

4.89 

4.74 

4.63 

12 

9.33 

6.93 

5.95 

5.41 

5.06 

4.82 

4.64 

4.50 

4.39 

13 

9.07 

6.70 

5.74 

5.21 

4.86 

4.62 

4.44 

4.30 

4.19 

14 

8.86 

6.51 

5.56 

5.04 

4.69 

4.46 

4.28 

4.14 

4.03 

15 

8.68 

6.36 

5.42 

4.89 

4.56 

4.32 

4.14 

4.00 

3.89 

16 

8.53 

6.23 

5.29 

4.77 

4.44 

4.20 

4.03 

3.89 

3.78 

17 

8.40 

6.11 

5.18 

4.67 

4.34 

4.10 

3.93 

3.79 

3.68 

18 

8.29 

6.01 

5.09 

4.58 

4.25 

4.01 

3.84 

3.71 

3.60 

19 

8.18 

5.93 

5.01 

4.50 

4.17 

3.94 

3.77 

3.63 

3.52 

20 

8.10 

5.85 

4.94 

4.43 

4.10 

3.87 

3.70 

3.56 

3.46 

21 

8.02 

5.78 

4.87 

4.37 

4.04 

3.81 

3.64 

3.51 

3.40 

22 

7.95 

5.72 

4.82 

4.31 

3.99 

3.76 

3.59 

3.45 

3.35 

23 

7.88 

5.66 

4.76 

4.26 

3.94 

3.71 

3.54 

3.41 

3.30 

24 

7.82 

5.61 

4.72 

4.22 

3.90 

3.67 

3.50 

3.36 

3.26 

25 

7.77 

5.57 

4.68 

4.18 

3.85 

3.63 

3.46 

3.32 

3.22 

26 

7.72 

5.53 

4.64 

4.14 

3.82 

3.59 

3.42 

3.29 

3.18 

27 

7.68 

5.49 

4.60 

4.11 

3.78 

3.56 

3.39 

3.26 

3.15 

28 

7.64 

5.45 

4.57 

4.07 

3.75 

3.53 

3.36 

3.23 

3.12 

29 

7.60 

5.42 

4.54 

4.04 

3.73 

3.50 

3.33 

3.20 

3.09 

30 

7.56 

5.39 

4.51 

4.02 

3.70 

3.47 

3.30 

3.17 

3.07 

40 

7.31 

5.18 

4.31 

3.83 

3.51 

3.29 

3.12 

2.99 

2.89 

60 

7.08 

4.98 

4.13 

3.65 

3.34 

3.12 

2.95 

2.82 

2.72 

120 

6.85 

4.79 

3.95 

3.48 

3.17 

2.96 

2.79 

2.66 

2.56 

00 

6.63 

4.61 

3.78 

3.32 

3.02 

2.80 

2.64 

2.51 

2.41 



TABLE G 

( continued ) 


Denominator 

degrees of Numerator degrees of freedom 


freedom 

10 

12 

15 

20 

24 

30 

40 

60 

120 

CO 

1 

6056 

6106 

6157 

6209 

6235 

6261 

6287 

6313 

6339 

6366 

2 

99.40 

99.42 

99.43 

99.45 

99.46 

99.47 

99.47 

99.48 

99.49 

99.50 

3 

27.23 

27.05 

26.87 

26.69 

26.60 

26.50 

26.41 

26.32 

26.22 

26.13 

4 

14.55 

14.37 

14.20 

14.02 

13.93 

13.84 

13.75 

13.65 

13.56 

13.46 

5 

10.05 

9.89 

9.72 

9.55 

9.47 

9.38 

9.29 

9.20 

9.11 

9.02 

6 

7.87 

7.72 

7.56 

7.40 

7.31 

7.23 

7.14 

7.06 

6.97 

6.88 

7 

6.62 

6.47 

6.31 

6.16 

6.07 

5.99 

5.91 

5.82 

5.74 

5.65 

8 

5.81 

5.67 

5.52 

5.36 

5.28 

5.20 

5.12 

5.03 

4.95 

4.86 

9 

5.26 

5.11 

4.96 

4.81 

4.73 

4.65 

4.57 

4.48 

4.40 

4.31 

10 

4.85 

4.71 

4.56 

4.41 

4.33 

4.25 

4.17 

4.08 

4.00 

3.91 

11 

4.54 

4.40 

4.25 

4.10 

4.02 

3.94 

3.86 

3.78 

3.69 

3.60 

12 

4.30 

4.16 

4.01 

3.86 

3.78 

3.70 

3.62 

3.54 

3.45 

3.36 

13 

4.10 

3.96 

3.82 

3.66 

3.59 

3.51 

3.43 

3.34 

3.25 

3.17 

14 

3.94 

3.80 

3.66 

3.51 

3.43 

3.35 

3.27 

3.18 

3.09 

3.00 

15 

3.80 

3.67 

3.52 

3.37 

3.29 

3.21 

3.13 

3.05 

2.96 

2.87 

16 

3.69 

3.55 

3.41 

3.26 

3.18 

3.10 

3.02 

2.93 

2.84 

2.75 

17 

3.59 

3.46 

3.31 

3.16 

3.08 

3.00 

2.92 

2.83 

2.75 

2.65 

18 

3.51 

3.37 

3.23 

3.08 

3.00 

2.92 

2.84 

2.75 

2.66 

2.57 

19 

3.43 

3.30 

3.15 

3.00 

2.92 

2.84 

2.76 

2.67 

2.58 

2.49 

20 

3.37 

3.23 

3.09 

2.94 

2.86 

2.78 

2.69 

2.61 

2.52 

2.42 

21 

3.31 

3.17 

3.03 

2.88 

2.80 

2.72 

2.64 

2.55 

2.46 

2.36 

22 

3.26 

3.12 

2.98 

2.83 

2.75 

2.67 

2.58 

2.50 

2.40 

2.31 

23 

3.21 

3.07 

2.93 

2.78 

2.70 

2.62 

2.54 

2.45 

2.35 

2.26 

24 

3.17 

3.03 

2.89 

2.74 

2.66 

2.58 

2.49 

2.40 

2.31 

2.21 

25 

3.13 

2.99 

2.85 

2.70 

2.62 

2.54 

2.45 

2.36 

2.27 

2.17 

26 

3.09 

2.96 

2.81 

2.66 

2.58 

2.50 

2.42 

2.33 

2.23 

2.13 

27 

3.06 

2.93 

2.78 

2.63 

2.55 

2.47 

2.38 

2.29 

2.20 

2.10 

28 

3.03 

2.90 

2.75 

2.60 

2.52 

2.44 

2.35 

2.26 

2.17 

2.06 

29 

3.00 

2.87 

2.73 

2.57 

2.49 

2.41 

2.33 

2.23 

2.14 

2.03 

30 

2.98 

2.84 

2.70 

2.55 

2.47 

2.39 

2.30 

2.21 

2.11 

2.01 

40 

2.80 

2.66 

2.52 

2.37 

2.29 

2.20 

2.11 

2.02 

1.92 

1.80 

60 

2.63 

2.50 

2.35 

2.20 

2.12 

2.03 

1.94 

1.84 

1.73 

1.60 

120 

2.47 

2.34 

2.19 

2.03 

1.95 

1.86 

1.76 

1.66 

1.53 

1.38 

00 

2.32 

2.18 

2.04 

1.88 

1.79 

1.70 

1.59 

1.47 

1.32 

1.00 


TABLE G 

(continued) 


F l0.995 

Denominator 
degrees of 
freedom 

1 

2 

Numerator degrees of freedom 

3 4 5 6 

7 

8 

9 


1 16211 20000 21615 22500 23056 23437 23715 23925 

2 198.5 199.0 199.2 199.2 199.3 199.3 199.4 199.4 

3 55.55 49.80 47.47 46.19 45.39 44.84 44.43 44.13 

4 31.33 26.28 24.26 23.15 22.46 21.97 21.62 21.35 

5 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 

6 18.63 14.54 12.92 12.03 11.46 11.07 10.79 10.57 

7 16.24 12.40 10.88 10.05 9.52 9.16 8.89 8.68 

8 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 

9 13.61 10.11 8.72 7.96 7.47 7.13 6.88 6.69 

10 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 

11 12.23 8.91 7.60 6.88 6.42 6.10 5.86 5.68 

12 11.75 8.51 7.23 6.52 6.07 5.76 5.52 5.35 

13 11.37 8.19 6.93 6.23 5.79 5.48 5.25 5.08 

14 11.06 7.92 6.68 6.00 5.56 5.26 5.03 4.86 

15 10.80 7.70 6.48 5.80 5.37 5.07 4.85 4.67 

16 10.58 7.51 6.30 5.64 5.21 4.91 4.69 4.52 

17 10.38 7.35 6.16 5.50 5.07 4.78 4.56 4.39 

18 10.22 7.21 6.03 5.37 4.96 4.66 4.44 4.28 

19 10.07 7.09 5.92 5.27 4.85 4.56 4.34 4.18 

20 9.94 6.99 5.82 5.17 4.76 4.47 4.26 4.09 

21 9.83 6.89 5.73 5.09 4.68 4.39 4.18 4.01 

22 9.73 6.81 5.65 5.02 4.61 4.32 4.11 3.94 

23 9.63 6.73 5.58 4.95 4.54 4.26 4.05 3.88 

24 9.55 6.66 5.52 4.89 4.49 4.20 3.99 3.83 

25 9.48 6.60 5.46 4.84 4.43 4.15 3.94 3.78 

26 9.41 6.54 5.41 4.79 4.38 4.10 3.89 3.73 

27 9.34 6.49 5.36 4.74 4.34 4.06 3.85 3.69 

28 9.28 6.44 5.32 4.70 4.30 4.02 3.81 3.65 

29 9.23 6.40 5.28 4.66 4.26 3.98 3.77 3.61 

30 9.18 6.35 5.24 4.62 4.23 3.95 3.74 3.58 

40 8.83 6.07 4.98 4.37 3.99 3.71 3.51 3.35 

60 8 49 5.79 4.73 4.14 3.76 3.49 3.29 3.13 

120 818 5.54 4.50 3.92 3.55 3.28 3.09 2.93 

oo 7.88 5.30 4.28 3.72 3.35 3.09 2.90 2.74 


24091 

199.4 

43.88 

21.14 


13.77 

10.39 

8.51 

7.34 

6.54 

5.97 

5.54 
5.20 
4.94 
4.72 

4.54 
4.38 
4.25 
4.14 
4.04 

3.96 

3.88 

3.81 
3.75 
3.69 

3.64 

3.60 

3.56 

3.52 

3.48 

3.45 

3.22 

3.01 

2.81 
2.62 




TABLE G 

(continued) 


Denominator 

degrees of Numerator degrees of freedom 


freedom 

10 

12 

15 

20 

24 

30 

40 

60 

120 

00 

1 

24224 

24426 

24630 

24836 

24940 

25044 

25148 

25253 

25359 

25465 

2 

199.4 

199.4 

199.4 

199.4 

199.5 

199.5 

199.5 

199.5 

199.5 

199.5 

3 

43.69 

43.39 

43.08 

42.78 

42.62 

42.47 

42.31 

42.15 

41.99 

41.83 

4 

20.97 

20.70 

20.44 

20.17 

20.03 

19.89 

19.75 

19.61 

19.47 

19.32 

5 

13.62 

13.38 

13.15 

12.90 

12.78 

12.66 

12.53 

12.40 

12.27 

12.14 

6 

10.25 

10.03 

9.81 

9.59 

9.47 

9.36 

9.24 

9.12 

9.00 

8.88 

7 

8.38 

8.18 

7.97 

7.75 

7.65 

7.53 

7.42 

7.31 

7.19 

7.08 

8 

7.21 

7.01 

6.81 

6.61 

6.50 

6.40 

6.29 

6.18 

6.06 

5.95 

9 

6.42 

6.23 

6.03 

5.83 

5.73 

5 62 

5.52 

5.41 

5.30 

5.19 

10 

5.85 

5.66 

5.47 

5.27 

5.17 

5.07 

4.97 

4.86 

4.75 

4.64 

11 

5.42 

5.24 

5.05 

4.86 

4.76 

4.65 

4.55 

4.44 

4.34 

4.23 

12 

5.09 

4.91 

4.72 

4.53 

4.43 

4.33 

4.23 

4.12 

4.01 

3.90 

13 

4.82 

4.64 

4.46 

4.27 

4.17 

4.07 

3.97 

3.87 

3.76 

3.65 

14 

4.60 

4.43 

4.25 

4.06 

3.96 

3.86 

3.76 

3.66 

3.55 

3.44 

15 

4.42 

4.25 

4.07 

3.88 

3.79 

3.69 

3.58 

3.48 

3.37 

3.26 

16 

4.27 

4.10 

3.92 

3.73 

3.64 

3.54 

3.44 

3.33 

3.22 

3.11 

17 

4.14 

3.97 

3.79 

3.61 

3.51 

3.41 

3.31 

3.21 

3.10 

2.98 

18 

4.03 

3.86 

3.68 

3.50 

3.40 

3.30 

3.20 

3.10 

2.99 

2.87 

19 

3.93 

3.76 

3.59 

3.40 

3.31 

3.21 

3.11 

3.00 

2.89 

2.78 

20 

3.85 

3.68 

3.50 

3.32 

3.22 

3.12 

3.02 

2.92 

2.81 

2.69 

21 

3.77 

3.60 

3.43 

3.24 

3.15 

3.05 

2.95 

2.84 

2.73 

2.61 

22 

3.70 

3.54 

3.36 

3.18 

3.08 

2.98 

2.88 

2.77 

2.66 

2.55 

23 

3.64 

3.47 

3.30 

3.12 

3.02 

2.92 

2.82 

2.71 

2.60 

2.48 

24 

3.59 

3.42 

3.25 

3.06 

2.97 

2.87 

2.77 

2.66 

2.55 

2.43 

25 

3.54 

3.37 

3.20 

3.01 

2.92 

2.82 

2.72 

2.61 

2.50 

2.38 

26 

3.49 

3.33 

3.15 

2.97 

2.87 

2.77 

2.67 

2.56 

2.45 

2.33 

27 

3.45 

3.28 

3.11 

2.93 

2.83 

2.73 

2.63 

2.52 

2.41 

2.29 

28 

3.41 

3.25 

3.07 

2.89 

2.79 

2.69 

2.59 

2.48 

2.37 

2.25 

29 

3.38 

3.21 

3.04 

2.86 

2.76 

2.66 

2.56 

2.45 

2.33 

2.21 

30 

3.34 

3.18 

3.01 

2.82 

2.73 

2.63 

2.52 

2.42 

2.30 

2.18 

40 

3.12 

2.95 

2.78 

2.60 

2.50 

2.40 

2.30 

2.18 

2.06 

1.93 

60 

2.90 

2.74 

2.57 

2.39 

2.29 

2.19 

2.08 

1.96 

1.83 

1.69 

120 

2.71 

2.54 

2.37 

2.19 

2.09 

1.98 

1.87 

1.75 

1.61 

1.43 

00 

2.52 

2.36 

2.19 

2.00 

1.90 

1.79 

1.67 

1.53 

1.36 

1.00 


TABLE H 

Percentage points 
of the Studentized 
range for 2 
through 20 
treatments 


Upper 5% points 


Error 

df 2 3 4 5 6 7 8 9 10 


1 

17.97 

26.98 

32.82 

37.08 

2 

6.08 

8.33 

9.80 

10.88 

3 

4.50 

5.91 

6.82 

7.50 

4 

3.93 

5.04 

5.76 

6.29 

5 

3.64 

4.60 

5.22 

5.67 

6 

3.46 

4.34 

4.90 

5.30 

7 

3.34 

4.16 

4.68 

5.06 

8 

3.26 

4.04 

4.53 

4.89 

9 

3.20 

3.95 

4.41 

4.76 

10 

3.15 

3.88 

4.33 

4.65 

11 

3.11 

3.82 

4.26 

4.57 

12 

3.08 

3.77 

4.20 

4.51 

13 

3.06 

3.73 

4.35 

4.45 

14 

3.03 

3.70 

4.11 

4.41 

15 

3.01 

3.67 

4.08 

4.37 

16 

3.00 

3.65 

4.05 

4.33 

17 

2.98 

3.63 

4.02 

4.30 

18 

2.97 

3.61 

4.00 

4.28 

19 

2.96 

3.59 

3.98 

4.25 

20 

2.95 

3.58 

3.96 

4.23 


40.41 

43.12 

45.40 

47.36 

49.07 

11.74 

12.44 

13.03 

13.54 

13.99 

8.04 

8.48 

8.85 

9.18 

9.46 

6.71 

7.05 

7.35 

7.60 

7.83 

6.03 

6.33 

6.58 

6.80 

6.99 

5.63 

5.90 

6.12 

6.32 

6.49 

5.36 

5.61 

5.82 

6.00 

6.16 

5.17 

5.40 

5.60 

5.77 

5.92 

5.02 

5.24 

5.43 

5.59 

5.74 

4.91 

5.12 

5.30 

5.46 

5.60 

4.82 

5.03 

5.20 

5.35 

5.49 

4.75 

4.95 

5.12 

5.27 

5.39 

4.69 

4.88 

5.05 

5.19 

5.32 

4.64 

4.83 

4.99 

5.13 

5.25 

4.59 

4.78 

4.94 

5.08 

5.20 

4.56 

4.74 

4.90 

5.03 

5.15 

4.52 

4.70 

4.86 

4.99 

5.11 

4.49 

4.67 

4.82 

4.96 

5.07 

4.47 

4.65 

4.79 

4.92 

5.04 

4.45 

4.62 

4.77 

4.90 

5.01 


24 

2.92 

3.53 

3.90 

4.17 

30 

2.89 

3.49 

3.85 

4.10 

40 

2.86 

3.44 

3.79 

4.04 

60 

2.83 

3.40 

3.74 

3.98 

120 

2.80 

3.36 

3.68 

3.92 

X 

2.77 

3.31 

3.63 

3.86 


4.37 

4.54 

4.68 

4.81 

4.92 

4.30 

4.46 

4.60 

4.72 

4.82 

4.23 

4.39 

4.52 

4.63 

4.73 

4.16 

4.31 

4.44 

4.55 

4.65 

4.10 

4.24 

4.36 

4.47 

4.56 

4.03 

4.17 

4.29 

4.39 

4.47 


Error 

df 11 12 13 14 15 16 17 18 19 20 


1 

50.59 

51.96 

53.20 

54.33 

55.36 

2 

14.39 

14.75 

15.08 

15.38 

15.65 

3 

9.72 

9.95 

10.15 

10.35 

10.52 

4 

8.03 

8.21 

8.37 

8.52 

8.66 

5 

7.17 

7.32 

7.47 

7.60 

7.72 

6 

6.65 

6.79 

6.92 

7.03 

7.14 

7 

6.30 

6.43 

6.55 

6.66 

6.76 

8 

6.05 

6.18 

6.29 

6.39 

6.48 

9 

5.87 

5.98 

6.09 

6.19 

6.28 

10 

5.72 

5.83 

5.93 

6.03 

6.11 

11 

5.61 

5.71 

5.81 

5.90 

5.98 

12 

5.51 

5.61 

5.71 

5.80 

5.88 

13 

5.43 

5.53 

5.63 

5.71 

5.79 

14 

5.36 

5.46 

5.55 

5.64 

5.71 

15 

5.31 

5.40 

5.49 

5.57 

5.65 

16 

5.26 

5.35 

5.44 

5.52 

5.59 

17 

5.21 

5.31 

5.39 

5.47 

5.54 

18 

5.17 

5.27 

5.35 

5.43 

5.50 

19 

5.14 

5.23 

5.31 

5.39 

5.46 

20 

5.11 

5.20 

5.28 

5.36 

5.43 

24 

5.01 

5.10 

5.18 

5.25 

5.32 

30 

4.92 

5.00 

5.08 

5.15 

5.21 

40 

4.82 

4.90 

4.98 

5.04 

5.11 

60 

4.73 

4.81 

4.88 

4.94 

5.00 

120 

4.64 

4.71 

4.78 

4.84 

4.90 

CC 

4.55 

4.62 

4.68 

4.74 

4.80 


56.32 

57.22 

58.04 

58.83 

59.56 

15.91 

16.14 

16.37 

16.57 

16.77 

10.69 

10.84 

10.98 

11.11 

11.24 

8.79 

8.91 

9.03 

9.13 

9.23 

7.83 

7.93 

8.03 

8.12 

8.21 

7.24 

7.34 

7.43 

7.51 

7.59 

6.85 

6.94 

7.02 

7.10 

7.17 

6.57 

6.65 

6.73 

6.80 

6.87 

6.36 

6.44 

6.51 

6.58 

6.64 

6.19 

6.27 

6.34 

6.40 

6.47 

6.06 

6.13 

6.20 

6.27 

6.33 

5.95 

6.02 

6.09 

6.15 

6.21 

5.86 

5.93 

5.99 

6.05 

6.11 

5.79 

5.85 

5.91 

5.97 

6.03 

5.72 

5.78 

5.85 

5.90 

5.96 

5.66 

5.73 

5.79 

5.84 

5.90 

5.61 

5.67 

5.73 

5.79 

5.84 

5.57 

5.63 

5.69 

5.74 

5.79 

5.53 

5.59 

5.65 

5.70 

5.75 

5.49 

5.55 

5.61 

5.66 

5.71 

5.38 

5.44 

5.49 

5.55 

5.59 

5.27 

5.33 

5.38 

5.43 

5.47 

5.16 

5.22 

5.27 

5.31 

5.36 

5.06 

5.11 

5.15 

5.20 

5.24 

4.95 

5.00 

5.04 

5.09 

5.13 

4.85 

4.89 

4.93 

4.97 

5.01 






TABLE H 
{continued) 


Upper 1% points 


Error 

df 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

90.03 

135.0 

164.3 

185.6 

202.2 215.8 227.2 

237.0 

245.6 

2 

14.04 

19.02 

22.29 

24.72 

26.63 28.20 

29.53 

30.68 

31.69 

3 

8.26 

10.62 

12.17 

13.33 

14.24 15.00 

15.64 

16.20 

16.69 

4 

6.51 

8.12 

9.17 

9.96 

10.58 1 

[1.10 

11.55 

11.93 

12.27 

5 

5.70 

6.98 

7.80 

8.42 

8.91 

9.32 

9.67 

9.97 

10.24 

6 

5.24 

6.33 

7.03 

7.56 

7.97 

8.32 

8.61 

8.87 

9.10 

7 

4.95 

5.92 

6.54 

7.01 

7.37 

7.68 

7.94 

8.17 

8.37 

8 

4.75 

5.64 

6.20 

6.62 

6.96 

7.24 

7.47 

7.68 

7.86 

9 

4.60 

5.43 

5.96 

6.35 

6.66 

6.91 

7.13 

7.33 

7.49 

10 

4.48 

5.27 

5.77 

6.14 

6.43 

6.67 

6.87 

7.05 

7.21 

11 

4.39 

5.15 

5.62 

5.97 

6.25 

6.48 

6.67 

6.84 

6.99 

12 

4.32 

5.05 

5.50 

5.84 

6.10 

6.32 

6.51 

6.67 

6.81 

13 

4.26 

4.96 

5.40 

5.73 

5.98 

6.19 

6.37 

6.53 

6.67 

14 

4.21 

4.89 

5.32 

5.63 

5.88 

6.08 

6.26 

6.41 

6.54 

15 

4.17 

4.84 

5.25 

5.56 

5.80 

5.99 

6.16 

6.31 

6.44 

16 

4.13 

4.79 

5.19 

5.49 

5.72 

5.92 

6.08 

6.22 

6.35 

17 

4.10 

4.74 

5.14 

5.43 

5.66 

5.85 

6.01 

6.15 

6.27 

18 

4.07 

4.70 

5.09 

5.38 

5.60 

5.79 

5.94 

6.08 

6.20 

19 

4.05 

4.67 

5.05 

5.33 

5.55 

5.73 

5.89 

6.02 

6.14 

20 

4.02 

4.64 

5.02 

5.29 

5.51 

5.69 

5.84 

5.97 

6.09 

24 

3.96 

4.55 

4.91 

5.17 

5.37 

5.54 

5.69 

5.81 

5.92 

30 

3.89 

4.45 

4.80 

5.05 

5.24 

5.40 

5.54 

5.65 

5.76 

40 

3.82 

4.37 

4.70 

4.93 

5.11 

5.26 

5.39 

5.50 

5.60 

60 

3.76 

4.28 

4.59 

4.82 

4.99 

5.13 

5.25 

5.36 

5.45 

120 

3.70 

4.20 

4.50 

4.71 

4.87 

5.01 

5.12 

5.21 

5.30 

oc 

3.64 

4.12 

4.40 

4.60 

4.76 

4.88 

4.99 

5.08 

5.16 

Error 

df 

11 

12 

13 

14 

15 16 

17 

18 

19 

20 

1 

253.2 

260.0 266.2 271.8 277.0 281.8 

286.3 

290.4 

294.3 

298.0 

2 

32.59 

33.40 

34.13 

34.81 35.43 36.00 

36.53 

37.03 

37.50 

37.95 

3 

17.13 

17.53 

17.89 

18.22 18.52 18.81 

19.07 

19.32 

19.55 

19.77 

4 

12.57 

12.84 

13.09 

13.32 13.53 13.73 

13.91 

14.08 

14.24 

14.40 

5 

10.48 

10.70 

10.89 

11.08 11.24 11.40 

11.55 

11.68 

11.81 

11.93 

6 

9.30 

9.48 

9.65 

9.81 

9.95 10.08 

10.21 

10.32 

10.43 

10.54 

7 

8.55 

8.71 

8.86 

9.00 

9.12 9.24 

9.35 

9.46 

9.55 

9.65 

8 

8.03 

8.18 

8.31 

8.44 

8.55 8.66 

8.76 

8.85 

8.94 

9.03 

9 

7.65 

7.78 

7.91 

8.03 

8.13 8.23 

8.33 

8.41 

8.49 

8.57 

10 

7.36 

7.49 

7.60 

7.71 

7.81 7.91 

7.99 

8.08 

8.15 

8.23 

11 

7.13 

7.25 

7.36 

7.46 

7.56 7.65 

7.73 

7.81 

7.88 

7.95 

12 

6.94 

7.06 

7.17 

7.26 

7.36 7.44 

7.52 

7.59 

7.66 

7.73 

13 

6.79 

6.90 

7.01 

7.10 

7.19 7.27 

7.35 

7.42 

7.48 

7.55 

14 

6.66 

6.77 

6.87 

6.96 

7.05 7.13 

7.20 

7.27 

7.33 

7.39 

15 

6.55 

6.66 

6.76 

6.84 

6.93 7.00 

7.07 

7.14 

7.20 

7.26 

16 

6.46 

6.56 

6.66 

6.74 

6.82 6.90 

6.97 

7.03 

7.09 

7.15 

17 

6.38 

6.48 

6.57 

6.66 

6.73 6.81 

6.87 

6.94 

7.00 

7.05 

18 

6.31 

6.41 

6.50 

6.58 

6.65 6.73 

6.79 

6.85 

6.91 

6.97 

19 

6.25 

6.34 

6.43 

6.51 

6.58 6.65 

6.72 

6.78 

6.84 

6.89 

20 

6.19 

6.28 

6.37 

6.45 

6.52 6.59 

6.65 

6.71 

6.77 

6.82 

24 

6.02 

6.11 

6.19 

6.26 

6.33 6.39 

6.45 

6.51 

6.56 

6.61 

30 

5.85 

5.93 

6.01 

6.08 

6.14 6.20 

6.26 

6.31 

6.36 

6.41 

40 

5.69 

5.76 

5.83 

5.90 

5.96 6.02 

6.07 

6.12 

6.16 

6.21 

60 

5.53 

5.60 

5.67 

5.73 

5.78 5.84 

5.89 

5.93 

5.97 

6.01 

120 

5.37 

5.44 

5.50 

5.56 

5.61 5.66 

5.71 

5.75 

5.79 

5.83 

00 

5.23 

5.29 

5.35 

5.40 

5.45 5.49 

5.54 

5.57 

5.61 

5.65 


TABLE I 

Transformation 
of r to z 


The body of the table contains values of z = 0.5 In [(1 + r)/( 1 — r )] = tanh -1 for corres¬ 
ponding values of r, the correlation coefficient. For negative values of r, put a minus sign in 
front of the tabled numbers. 


r 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.00000 

0.01000 

0.02000 

0.03001 

0.04002 

0.05004 

0.06007 

0.07011 

0.08017 

0.09024 

0.1 

0.10034 

0.11045 

0.12058 

0.13074 

0.14093 

0.15114 

0.16139 

0.17167 

0.18198 

0.19234 

0.2 

0.20273 

0.21317 

0.22366 

0.23419 

0.24477 

0.25541 

0.26611 

0.27686 

0.28768 

0.29857 

0.3 

0.30952 

0.32055 

0.33165 

0.34283 

0.35409 

0.36544 

0.37689 

0.38842 

0.40006 

0.41180 

0.4 

0.42365 

0.43561 

0.44769 

0.45990 

0.47223 

0.48470 

0.49731 

0.51007 

0.52298 

0.53606 

0.5 

0.54931 

0.56273 

0.57634 

0.59015 

0.60416 

0.61838 

0.63283 

0.64752 

0.66246 

0.67767 

0.6 

0.69315 

0.70892 

0.72501 

0.74142 

0.75817 

0.77530 

0.79281 

0.81074 

0.82911 

0.84796 

0.7 

0.86730 

0.88718 

0.90764 

0.92873 

0.95048 

0.97296 

0.99622 

1.02033 

1.04537 

1.07143 

0.8 

1.09861 

1.12703 

1.15682 

1.18814 

1.22117 

1.25615 

1.29334 

1.33308 

1.37577 

1.42193 

0.9 

1.47222 

1.52752 

1.58903 

1.65839 

1.73805 

1.83178 

1.94591 

2.09230 

2.29756 

2.64665 



TABLE Ja 
Table of critical 
values of r in the 
runs test 


Table Ja and Table Jb contain various critical values of r for various values of a?-, and n 2 . For the 
one-sample runs test, any value of r that is equal to or smaller than that shown in Table Ja or 
equal to or larger than that shown in Table Jb is significant at the 0.05 level. 



TABLE Jb 
Table of critical 
values of r in the 
runs test 




TABLE K 
c/-factors for 
Wilcoxon signed 
rank test 


(a' = one-sided significance level, a" = two-sided significance level) 


n 

d 

a" 

a' 

n 

d 

a" 

a 1 

n 

d 

a" 

a 

3 

1 

.250 

.125 

13 

10 

.008 

.004 

20 

38 

.009 

.005 

4 

1 

.125 

.063 


11 

.010 

.005 


39 

.011 

.005 

5 

1 

.062 

.031 


18 

.048 

.024 


53 

.048 

.024 


2 

.125 

.063 


19 

.057 

.029 


54 

.053 

.027 

6 

1 

.031 

.016 


22 

.094 

.047 


61 

.097 

.049 


2 

.063 

.031 


23 

.110 

.055 


62 

.105 

.053 


3 

.094 

.047 

14 

13 

.009 

.004 

21 

43 

.009 

.005 


4 

.156 

.078 


14 

.011 

.005 


44 

.010 

.005 

7 

1 

.016 

.008 


22 

.049 

.025 


59 

.046 

.023 


2 

.031 

.016 


23 

.058 

.029 


60 

.050 

.025 


4 

.078 

.039 


26 

.091 

.045 


68 

.096 

.048 


5 

.109 

.055 


27 

.104 

.052 


69 

.103 

.052 

8 

1 

.008 

.004 

15 

16 

.008 

.004 

22 

49 

.009 

.005 


2 

.016 

.008 


17 

.010 

.005 


50 

.010 

.005 


4 

.039 

.020 


26 

.048 

.024 


66 

.046 

.023 


5 

.055 

.027 


27 

.055 

.028 


67 

.050 

.025 


6 

.078 

.039 


31 

.095 

.047 


76 

.098 

.049 


7 

.109 

.055 


32 

.107 

.054 


77 

.105 

.053 

9 

2 

.008 

.004 

16 

20 

.009 

.005 

23 

55 

.009 

.005 


3 

.012 

.006 


21 

.011 

.006 


56 

.010 

.005 


6 

.039 

.020 


30 

.044 

.022 


74 

.048 

.024 


7 

.055 

.027 


31 

.051 

.025 


75 

.052 

.026 


9 

.098 

.049 


36 

.093 

.047 


84 

.098 

.049 


10 

.129 

.065 


37 

.105 

.052 


85 

.105 

.052 

10 

4 

.010 

.005 

17 

24 

.009 

.005 

24 

62 

.010 

.005 


5 

.014 

.007 


25 

.011 

.006 


63 

.011 

.005 


9 

.049 

.024 


35 

.045 

.022 


82 

.049 

.025 


10 

.064 

.032 


36 

.051 

.025 


83 

.053 

.026 


11 

.084 

.042 


42 

.098 

.049 


92 

.095 

.048 


12 

.105 

.053 


43 

.109 

.054 


93 

.101 

.051 

11 

6 

.010 

.005 

18 

28 

.009 

.005 

25 

69 

.010 

.005 


7 

.014 

.007 


29 

.010 

.005 


70 

.011 

.005 


11 

.042 

.021 


41 

.048 

.024 


90 

.048 

.024 


12 

.054 

.027 


42 

.054 

.027 


91 

.052 

.026 


14 

.083 

.042 


48 

.099 

.049 


101 

.096 

.048 


15 

.102 

.051 


49 

.108 

.054 


102 

.101 

.051 

12 

8 

.009 

.005 

19 

33 

.009 

.005 






9 

.012 

.006 


34 

.011 

.005 






14 

.042 

.021 


47 

.049 

.025 






15 

.052 

.026 


48 

.055 

.027 






18 

.092 

.046 


54 

.096 

.048 






19 

.110 

.055 


55 

.104 

.052 





Note; 

For n 

> 25 use d ~ 2 

[\n{n + 1) + 

1 - z\/n(n 

-1)(2/7 

- 1)/6], where z i 

s read from Table C. 



I MDLC L 

Quantiles of the 
Mann-Whitney 
test statistic 


«1 

P 

"2 = 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


.001 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


.005 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

2 

.01 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

2 

2 


.025 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

2 

2 

2 

2 

2 

3 

3 

3 

3 


.05 

0 

0 

0 

1 

1 

1 

2 

2 

2 

2 

3 

3 

4 

4 

4 

4 

5 

5 

5 


.10 

0 

1 

1 

2 

2 

2 

3 

3 

4 

4 

5 

5 

5 

6 

6 

7 

7 

8 

8 


.001 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 


.005 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

2 

2 

2 

3 

3 

3 

3 

4 

4 

3 

.01 

0 

0 

0 

0 

0 

1 

1 

2 

2 

2 

3 

3 

3 

4 

4 

5 

5 

5 

6 


.025 

0 

0 

0 

1 

2 

2 

3 

3 

4 

4 

5 

5 

6 

6 

7 

7 

8 

8 

9 


.05 

0 

1 

1 

2 

3 

3 

4 

5 

5 

6 

6 

7 

8 

8 

9 

10 

10 

11 

12 


.10 

1 

2 

2 

3 

4 

5 

6 

6 

7 

8 

9 

10 

11 

11 

12 

13 

14 

15 

16 


.001 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

2 

2 

2 

3 

3 

4 

4 

4 


.005 

0 

0 

0 

0 

1 

1 

2 

2 

3 

3 

4 

4 

5 

6 

6 

7 

7 

8 

9 

4 

.01 

0 

0 

0 

1 

2 

2 

3 

4 

4 

5 

6 

6 

7 

9 

8 

9 

10 

10 

11 


.025 

0 

0 

1 

2 

3 

4 

5 

5 

6 

7 

8 

9 

10 

11 

12 

12 

13 

14 

15 


.05 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

15 

16 

17 

18 

19 


.10 

1 

2 

4 

5 

6 

7 

8 

10 

11 

12 

13 

14 

16 

17 

18 

19 

21 

22 

23 


.001 

0 

0 

0 

0 

0 

0 

1 

2 

2 

3 

3 

4 

4 

5 

6 

6 

7 

8 

8 


.005 

0 

0 

0 

1 

2 

2 

3 

4 

5 

6 

7 

8 

8 

9 

10 

11 

12 

13 

14 

5 

.01 

0 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 


.025 

0 

1 

2 

3 

4 

6 

7 

8 

9 

10 

12 

13 

14 

15 

16 

18 

19 

20 

21 


.05 

1 

2 

3 

5 

6 

7 

9 

10 

12 

13 

14 

16 

17 

19 

20 

21 

23 

24 

26 


.10 

2 

3 

5 

6 

8 

9 

11 

13 

14 

16 

18 

19 

21 

23 

24 

26 

28 

29 

31 


.001 

0 

0 

0 

0 

0 

0 

2 

3 

4 

5 

5 

6 

7 

8 

9 

10 

11 

12 

13 


.005 

0 

0 

1 

2 

3 

4 

5 

6 

7 

8 

10 

11 

12 

13 

14 

16 

17 

18 

19 

6 

.01 

0 

0 

2 

3 

4 

5 

7 

8 

9 

10 

12 

13 

14 

16 

17 

19 

20 

21 

23 


.025 

0 

2 

3 

4 

6 

7 

9 

11 

12 

14 

15 

17 

18 

20 

22 

23 

25 

26 

28 


.05 

1 

3 

4 

6 

8 

9 

11 

13 

15 

17 

18 

20 

22 

24 

26 

27 

29 

31 

33 


.10 

2 

4 

6 

8 

10 

12 

14 

16 

18 

20 

22 

24 

26 

28 

30 

32 

35 

37 

39 


.001 

0 

0 

0 

0 

1 

2 

3 

4 

6 

7 

8 

9 

10 

11 

12 

14 

15 

16 

17 


.005 

0 

0 

1 

2 

4 

5 

7 

8 

10 

11 

13 

14 

16 

17 

19 

20 

22 

23 

25 

7 

.01 

0 

1 

2 

4 

5 

7 

8 

10 

12 

13 

15 

17 

18 

20 

22 

24 

25 

27 

29 


.025 

0 

2 

4 

6 

7 

9 

11 

13 

15 

17 

19 

21 

23 

25 

27 

29 

31 

33 

35 


.05 

1 

3 

5 

7 

9 

12 

14 

16 

18 

20 

22 

25 

27 

29 

31 

34 

36 

38 

40 


.10 

2 

5 

7 

9 

12 

14 

17 

19 

22 

24 

27 

29 

32 

34 

37 

39 

42 

44 

47 


.001 

0 

0 

0 

1 

2 

3 

5 

6 

7 

9 

10 

12 

13 

15 

16 

18 

19 

21 

22 


.005 

0 

0 

2 

3 

5 

7 

8 

10 

12 

14 

16 

18 

19 

21 

23 

25 

27 

29 

31 

8 

.01 

0 

1 

3 

5 

7 

8 

10 

12 

14 

16 

18 

21 

23 

25 

27 

29 

31 

33 

35 


.025 

1 

3 

5 

7 

9 

11 

14 

16 

18 

20 

23 

25 

27 

30 

32 

35 

37 

39 

42 


.05 

2 

4 

6 

9 

11 

14 

16 

19 

21 

24 

27 

29 

32 

34 

37 

40 

42 

45 

48 


.10 

3 

6 

8 

11 

14 

17 

20 

23 

25 

28 

31 

34 

37 

40 

43 

46 

49 

52 

55 


.001 

0 

0 

0 

2 

3 

4 

6 

8 

9 

11 

13 

15 

16 

18 

20 

22 

24 

26 

27 


.005 

0 

1 

2 

4 

6 

8 

10 

12 

14 

17 

19 

21 

23 

25 

28 

30 

32 

34 

37 

9 

.01 

0 

2 

4 

6 

8 

10 

12 

15 

17 

19 

22 

24 

27 

29 

32 

34 

37 

39 

41 


.025 

1 

3 

5 

8 

11 

13 

16 

18 

21 

24 

27 

29 

32 

35 

38 

40 

43 

46 

49 


.05 

2 

5 

7 

10 

13 

16 

19 

22 

25 

28 

31 

34 

37 

40 

43 

46 

49 

52 

55 


.10 

3 

6 

10 

13 

16 

19 

23 

26 

29 

32 

36 

39 

42 

46 

49 

53 

56 

59 

63 


.001 

0 

0 

1 

2 

4 

6 

7 

9 

11 

13 

15 

18 

20 

22 

24 

26 

28 

30 

33 


.005 

0 

1 

3 

5 

7 

10 

12 

14 

17 

19 

22 

25 

27 

30 

32 

35 

38 

40 

43 

10 

.01 

0 

2 

4 

7 

9 

12 

14 

17 

20 

23 

25 

28 

31 

34 

37 

39 

42 

45 

48 


.025 

1 

4 

6 

9 

12 

15 

18 

21 

24 

27 

30 

34 

37 

40 

43 

46 

49 

53 

56 


.05 

2 

5 

8 

12 

15 

18 

21 

25 

28 

32 

35 

38 

42 

45 

49 

52 

56 

59 

63 


.10 

4 

7 

11 

14 

18 

22 

25 

29 

33 

37 

40 

44 

48 

52 

55 

59 

63 

67 

71 


644 







n i 

P 

tl 2 — 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


.001 

0 

0 

1 

3 

5 

7 

9 

11 

13 

16 

18 

21 

23 

25 

28 

30 

33 

35 

38 


.005 

0 

1 

3 

6 

8 

11 

14 

17 

19 

22 

25 

28 

31 

34 

37 

40 

43 

46 

49 

11 

.01 

0 

2 

5 

8 

10 

13 

16 

19 

23 

26 

29 

32 

35 

38 

42 

45 

48 

51 

54 


.025 

1 

4 

7 

10 

14 

17 

20 

24 

27 

31 

34 

38 

41 

45 

48 

52 

56 

59 

63 


.05 

2 

6 

9 

13 

17 

20 

24 

28 

32 

35 

39 

43 

47 

51 

55 

58 

62 

66 

70 


.10 

4 

8 

12 

16 

20 

24 

28 

32 

37 

41 

45 

49 

53 

58 

62 

66 

70 

74 

79 


.001 

0 

0 

1 

3 

5 

8 

10 

13 

15 

18 

21 

24 

26 

29 

32 

35 

38 

41 

43 


.005 

0 

2 

4 

7 

10 

13 

16 

19 

22 

25 

28 

32 

35 

38 

42 

45 

48 

52 

55 

12 

.01 

0 

3 

6 

9 

12 

15 

18 

22 

25 

29 

32 

36 

39 

43 

47 

50 

54 

57 

61 


.025 

2 

5 

8 

12 

15 

19 

23 

27 

30 

34 

38 

42 

46 

50 

54 

58 

62 

66 

70 


.05 

3 

6 

10 

14 

18 

22 

27 

31 

35 

39 

43 

48 

52 

56 

61 

65 

69 

73 

78 


.10 

5 

9 

13 

18 

22 

27 

31 

36 

40 

45 

50 

54 

59 

64 

68 

73 

78 

82 

37 


.001 

0 

0 

2 

4 

6 

9 

12 

15 

18 

21 

24 

27 

30 

33 

36 

39 

43 

46 

49 


.005 

0 

2 

4 

8 

11 

14 

18 

21 

25 

28 

32 

35 

39 

43 

46 

50 

54 

58 

61 

13 

.01 

1 

3 

6 

10 

13 

17 

21 

24 

28 

32 

36 

40 

44 

48 

52 

56 

60 

64 

68 


.025 

O 

5 

9 

13 

17 

21 

25 

29 

34 

38 

42 

46 

51 

55 

60 

64 

68 

73 

77 


.05 

3 

7 

11 

16 

20 

25 

29 

34 

38 

43 

48 

52 

57 

62 

66 

71 

76 

81 

85 


.10 

5 

10 

14 

19 

24 

29 

34 

39 

44 

49 

54 

59 

64 

69 

75 

80 

85 

90 

95 


.001 

0 

0 

2 

4 

7 

10 

13 

16 

20 

23 

26 

30 

33 

37 

40 

44 

47 

51 

.55 


.005 

0 

2 

5 

8 

12 

16 

19 

23 

27 

31 

35 

39 

43 

47 

51 

55 

59 

64 

68 

14 

. 0 ! 

1 

3 

7 

11 

14 

18 

23 

27 

31 

35 

39 

44 

48 

52 

57 

61 

66 

70 

74 


.025 

2 

6 

10 

14 

18 

23 

27 

32 

37 

41 

46 

51 

56 

60 

65 

70 

75 

79 

84 


.05 

4 

8 

12 

17 

22 

27 

32 

37 

42 

47 

52 

57 

62 

67 

72 

78 

83 

88 

93 


.10 

5 

11 

16 

21 

26 

32 

37 

42 

48 

53 

59 

64 

70 

75 

81 

86 

92 

98 

103 


.001 

0 

0 

2 

5 

8 

11 

15 

18 

22 

25 

29 

33 

37 

41 

44 

48 

52 

56 

60 


.005 

0 

3 

6 

9 

13 

17 

21 

25 

30 

34 

38 

43 

47 

52 

56 

61 

65 

70 

74 

15 

.01 

1 

4 

8 

12 

16 

20 

25 

29 

34 

38 

43 

48 

52 

57 

62 

67 

71 

76 

31 


.025 

2 

6 

11 

15 

20 

25 

30 

35 

40 

45 

50 

55 

60 

65 

71 

76 

81 

86 

91 


.05 

4 

8 

13 

19 

24 

29 

34 

40 

45 

51 

56 

62 

67 

73 

78 

84 

89 

95 

101 


.10 

6 

11 

17 

23 

28 

34 

40 

46 

52 

58 

64 

69 

75 

81 

87 

93 

99 

105 

111 


.001 

0 

0 

3 

6 

9 

12 

16 

20 

24 

28 

3? 

36 

40 

44 

49 

53 

57 

61 

66 


.005 

0 

3 

6 

10 

14 

19 

23 

28 

32 

37 

42 

46 

51 

56 

61 

66 

71 

75 

30 

16 

.01 

1 

4 

8 

13 

17 

22 

27 

32 

37 

42 

47 

52 

57 

62 

67 

72 

77 

83 

88 


.025 

2 

7 

12 

16 

22 

27 

32 

38 

43 

48 

54 

60 

65 

71 

76 

82 

87 

93 

99 


.05 

4 

9 

15 

20 

26 

31 

37 

43 

49 

55 

61 

66 

72 

78 

84 

90 

96 

102 

108 


.10 

6 

12 

18 

24 

30 

37 

43 

49 

55 

62 

68 

75 

81 

87 

94 

100 

107 

113 

120 


.001 

0 

l 

3 

6 

10 

14 

18 

22 

26 

30 

35 

39 

44 

48 

53 

58 

62 

67 

71 


.005 

0 

3 

7 

11 

16 

20 

25 

30 

35 

40 

45 

50 

55 

61 

66 

71 

76 

82 

87 

17 

.01 

1 

5 

9 

14 

19 

24 

29 

34 

39 

45 

50 

56 

61 

67 

72 

78 

83 

89 

94 


.025 

3 

7 

12 

18 

23 

29 

35 

40 

46 

52 

58 

64 

70 

76 

82 

88 

94 

100 

106 


.05 

4 

10 

16 

21 

27 

34 

40 

46 

52 

58 

65 

71 

78 

84 

90 

97 

103 

110 

116 


.10 

7 

13 

19 

26 

32 

39 

46 

53 

59 

66 

73 

80 

86 

93 

100 

107 

114 

121 

128 


.001 

0 

1 

4 

7 

11 

15 

19 

24 

28 

33 

38 

43 

47 

52 

57 

62 

67 

72 

77 


.005 

0 

3 

7 

12 

17 

22 

27 

32 

38 

43 

48 

54 

59 

65 

71 

76 

82 

88 

93 

18 

.01 

1 

5 

10 

15 

20 

25 

31 

37 

42 

48 

54 

60 

66 

71 

77 

83 

89 

95 

101 


.025 

3 

8 

13 

19 

25 

31 

37 

4.3 

49 

56 

62 

68 

75 

81 

87 

94 

100 

107 

113 


.05 

5 

10 

17 

23 

29 

36 

42 

49 

56 

62 

69 

76 

83 

89 

96 

103 

110 

117 

124 


.10 

7 

14 

21 

28 

35 

42 

49 

56 

63 

70 

78 

85 

92 

99 

107 

114 

121 

129 

136 


.001 

0 

1 

4 

8 

12 

16 

21 

26 

30 

35 

41 

46 

51 

56 

61 

67 

72 

78 

83 


.005 

1 

4 

8 

13 

18 

23 

29 

34 

40 

46 

52 

58 

64 

70 

75 

82 

88 

94 

100 

19 

. 0 ! 

2 

5 

10 

16 

21 

27 

33 

39 

45 

51 

57 

64 

70 

76 

83 

89 

95 

102 

108 


.025 

3 

8 

14 

20 

26 

33 

39 

46 

53 

59 

66 

73 

79 

86 

93 

100 

107 

114 

120 


.05 

5 

11 

18 

24 

31 

38 

45 

52 

59 

66 

73 

81 

88 

95 

102 

no 

117 

124 

131 


.10 

8 

15 

22 

29 

37 

44 

52 

59 

67 

74 

82 

90 

98 

105 

113 

121 

129 

136 

144 


.001 

0 

1 

4 

8 

13 

17 

22 

27 

33 

38 

43 

49 

55 

60 

66 

71 

77 

83 

89 


.005 

1 

4 

9 

14 

19 

25 

31 

37 

43 

49 

55 

61 

68 

74 

80 

87 

93 

100 

106 

20 

.01 

2 

6 

11 

17 

23 

29 

35 

41 

48 

54 

61 

68 

74 

81 

88 

94 

101 

108 

115 


.025 

3 

9 

15 

21 

28 

35 

42 

49 

56 

63 

70 

77 

84 

91 

99 

106 

113 

120 

128 


.05 

5 

12 

19 

26 

33 

40 

48 

55 

63 

70 

78 

85 

93 

101 

108 

116 

124 

131 

139 


.10 

8 

16 

23 

31 

39 

47 

55 

63 

71 

79 

87 

95 

103 

111 

120 

128 

136 

144 

152 




CM CM CM 


Sample sizes 


Critical 

value 


Sample sizes 


Critical 

value 


TABLE M 
Critical values of 
the Kruskal-Wallis 
test statistic 


/?2 n 3 


1 1 


3 3 1 


3 3 2 


3 3 3 


1 1 
2 1 


2 2 


3 1 


3 2 


3 3 


2.7000 

3.6000 
4.5714 

3.7143 

3.2000 
4.2857 
3.8571 
5.3572 

4.7143 
4.5000 
4.4643 
5.1429 

4.5714 
4.0000 

6.2500 
5.3611 
5.1389 
4.5556 

4.2500 

7.2000 
6.4889 
5.6889 

5.6000 
5.0667 
4.6222 

3.5714 
4.8214 
4.5000 
4.0179 
6.0000 
5.3333 
5.1250 
4.4583 
4.1667 
5.8333 
5.2083 
5.0000 
4.0556 
3.8889 

6.4444 
6.3000 

5.4444 
5.4000 
4.5111 

4.4444 
6.7455 

6.7091 
5.7909 
5.7273 

4.7091 


4 3 


4 4 


1 1 
2 1 


2 2 


3 1 


3 2 


4.7000 

6.6667 

6.1667 
4.9667 
4.8667 

4.1667 
4.0667 
7.0364 
6.8727 
5.4545 
5.2364 
4.5545 
4.4455 
7.1439 
7.1364 
5.5985 
5.5758 
4.5455 
4.4773 

7.6538 
7.5385 
5.6923 

5.6538 

4.6539 
4.5001 
3.8571 
5.2500 
5.0000 
4.4500 
4.2000 
4.0500 
6.5333 
6.1333 
5.1600 
5.0400 
4.3733 
4.2933 
6.4000 
4.9600 
4.8711 
4.0178 
3.8400 
6.9091 
6.8218 
5.2509 
5.1055 
4.6509 
4.4945 
7.0788 
6.9818 


5 


3 


3 


TABLE M 
(continued) 



Sample sizes 


(X 

Sample sizes 


a 

rtl 

i n 2 

n 3 

value 


n 2 

n 3 

value 

5 

3 

3 

5.6485 

0.049 

5 

5 

1 

6.8364 

0.011 




5.5152 

0.051 




5.1273 

0.046 




4.5333 

0.097 




4.9091 

0.053 




4.4121 

0.109 




4.1091 

0.086 

5 

4 

1 

6.9545 

0.008 




4.0364 

0.105 




6.8400 

0.011 

5 

5 

2 

7.3385 

0.010 




4.9855 

0.044 




7.2692 

0.010 




4.8600 

0.056 




5.3385 

0.047 




3.9873 

0.098 




5.2462 

0.051 




3.9600 

0.102 




4.6231 

0.097 

5 

4 

2 

7.2045 

0.009 




4.5077 

0.100 




7.1182 

0.010 

5 

5 

3 

7.5780 

0.010 




5.2727 

0.049 




7.5429 

0.010 




5.2682 

0.050 




5.7055 

0.046 




4.5409 

0.098 




5.6264 

0.051 




4.5182 

0.101 




4.5451 

0.100 

5 

4 

3 

7.4449 

0.010 




4.5363 

0.102 




7.3949 

0.011 

5 

5 

4 

7.8229 

0.010 




5.6564 

0.049 




7.7914 

0.010 




5.6308 

0.050 




5.6657 

0.049 




4.5487 

0.099 




5.6429 

0.050 




4.5231 

0.103 




4.5229 

0.099 

5 

4 

4 

7.7604 

0.009 




4.5200 

0.101 




7.7440 

0.011 

5 

5 

5 

8.0000 

0.009 




5.6571 

0.049 




7.9800 

0.010 




5.6176 

0.050 




5.7800 

0.049 




4.6187 

0.100 | 




5.6600 

0.051 




4.5527 

0.102 




4.5600 

0.100 

5 

5 

1 

7.3091 

0.009 




4.5000 

0.102 



TABLE Na 
Exact distribution 
of x? for tables 
with 2 to 9 sets of 
three ranks {k = 3; 
n = 2, 3, 4, 5, 6, 7, 
8, 9) 


p is the probability of obtaining a value of x 2 r as great as or greater than the 
corresponding value of x 2 r . 


n 

= 2 

n 

= 3 

/? 

= 4 

n 

= 5 

X 2 r 

P 

X 2 r 

P 

x 2 

P 

X 2 

P 

0 

1.000 

0.000 

1.000 

0.0 

1.000 

0.0 

1.000 

1 

0.833 

0.667 

0.944 

0.5 

0.931 

0.4 

0.954 

3 

0.500 

2.000 

0.528 

1.5 

0.653 

1.2 

0.691 

4 

0.167 

2.667 

0.361 

2.0 

0.431 

1.6 

0.522 



4.667 

0.194 

3.5 

0.273 

2.8 

0.367 



6.000 

0.028 

4.5 

0.125 

3.6 

0.182 





6.0 

0.069 

4.8 

0.124 





6.5 

0.042 

5.2 

0.093 





8.0 

0.0046 

6.4 

0.039 







7.6 

0.024 







8.4 

0.0085 







10.0 

0.00077 

n 

= 6 

n 

= 7 

n 

= 8 

n 

= 9 

x 2 

P 

X 2 

P 

X 2 r 

P 

x 2 

P 

0.00 

1.000 

0.000 

1.000 

0.00 

1.000 

0.000 

1.000 

0.33 

0.956 

0.286 

0.964 

0.25 

0.967 

0.222 

0.971 

1.00 

0.740 

0.857 

0.768 

0.75 

0.794 

0.667 

0.814 

1.33 

0.570 

1.143 

0.620 

1.00 

0.654 

0.889 

0.865 

2.33 

0.430 

2.000 

0.486 

1.75 

0.531 

1.556 

0.569 

3.00 

0.252 

2.571 

0.305 

2.25 

0.355 

2.000 

0.398 

4.00 

0.184 

3.429 

0.237 

3.00 

0.285 

2.667 

0.328 

4.33 

0.142 

3.714 

0.192 

3.25 

0.236 

2.889 

0.278 

5.33 

0.072 

4.571 

0.112 

4.00 

0.149 

3.556 

0.187 

6.33 

0.052 

5.429 

0.085 

4.75 

0.120 

4.222 

0.154 

7.00 

0.029 

6.000 

0.052 

5.25 

0.079 

4.667 

0.107 

8.33 

0.012 

7.143 

0.027 

6.25 

0.047 

5.556 

0.069 

9.00 

0.0081 

7.714 

0.021 

6.75 

0.038 

6.000 

0.057 

9.33 

0.0055 

8.000 

0.016 

7.00 

0.030 

6.222 

0.048 

10.33 

0.0017 

8.857 

0.0084 

7.75 

0.018 

6.889 

0.031 

12.00 

0.00013 

10.286 

0.0036 

9.00 

0.0099 

8.000 

0.019 



10.571 

0.0027 

9.25 

0.0080 

8.222 

0.016 



11.143 

0.0012 

9.75 

0.0048 

8.667 

0.010 



12.286 

0.00032 

10.75 

0.0024 

9.556 

0.0060 



14.000 

0.000021 

12.00 

0.0011 

10.667 

0.0035 





12.25 

0.00086 

10.889 

0.0029 





13.00 

0.00026 

11.556 

0.0013 





14.25 

0.000061 

12.667 

0.00066 





16.00 

0.0000036 

13.556 

0.00035 







14.000 

0.00020 





j 


14.222 

0.000097 







14.889 

0.000054 







16.222 

0.000011 


! 





18.000 

0.0000006 



TABLE Nb 
Exact distribution 
of x 2 r f° r tables 
with 2 to 4 sets of 
four ranks {k = 4; 
n = 2,3, 4) 


p is the probability of obtaining a value of x 2 as great as or greater than the 
corresponding value of x 2 r . 


n = 2 


X 2 P 


0.0 

1.000 

0.6 

0.958 

1.2 

0.834 

1.8 

0.792 

2.4 

0.625 

3.0 

0.542 

3.6 

0.458 

4.2 

0.375 

4.8 

0.208 

5.4 

0.167 

6.0 

0.042 


n 

= 3 

n — 4 

r? 

P 

*? 

P 


P 

0.2 

1.000 

0.0 

1.000 

5.7 

0.141 

0.6 

0.958 

0.3 

0.992 

6.0 

0.105 

1.0 

0.910 

0.6 

0.928 

6.3 

0.094 

1.8 

0.727 

0.9 

0.900 

6.6 

0.077 

2.2 

0.608 

1.2 

0.800 

6.9 

0.068 

2.6 

0.524 

1.5 

0.754 

7.2 

0.054 

3.4 

0.446 

1.8 

0.677 

7.5 

0.052 

3.8 

0.342 

2.1 

0.649 

7.8 

0.036 

4.2 

0.300 

2.4 

0.524 

8.1 

0.033 

5.0 

0.207 

2.7 

0.508 

8.4 

0.019 

5.4 

0.175 

3.0 

0.432 

8.7 

0.014 

5.8 

0.148 

3.3 

0.389 

9.3 

0.012 

6.6 

0.075 

3.6 

0.355 

9.6 

0.0069 

7.0 

0.054 

3.9 

0.324 

9.9 

0.0062 

7.4 

0.033 

4.5 

0.242 

10.2 

0.0027 

8.2 

0.017 

4.8 

0.200 

10.8 

0.0016 

9.0 

0.0017 

5.1 

0.190 

11.1 

0.00094 



5.4 

0.158 

12.0 

0.000072 



TABLE O 
Critical values of 
the Spearman test 
statistic 


Approximate upper-tail critical values, /*, where P(r s > r*) < <x, n — 4(1)30 

Significance Level, a 


n 

0.001 

0.005 

0.010 

0.025 

0.050 

0.100 

4 

5 

6 


.9429 

.9000 

.8857 

.9000 

.8286 

.8000 

.8000 

.7714 

.8000 

.7000 

.6000 

7 

.9643 

.8929 

.8571 

.7450 

.6786 

.5357 

8 

.9286 

.8571 

.8095 

.7143 

.6190 

.5000 

9 

.9000 

.8167 

.7667 

.6833 

.5833 

.4667 

10 

.8667 

.7818 

.7333 

.6364 

.5515 

.4424 

11 

.8364 

.7545 

.7000 

.6091 

.5273 

.4182 

12 

.8182 

.7273 

.6713 

.5804 

.4965 

.3986 

13 

.7912 

.6978 

.6429 

.5549 

.4780 

.3791 

14 

.7670 

.6747 

.6220 

.5341 

.4593 

.3626 

15 

.7464 

.6536 

.6000 

.5179 

.4429 

.3500 

16 

.7265 

.6324 

.5824 

.5000 

.4265 

.3382 

17 

.7083 

.6152 

.5637 

.4853 

.4118 

.3260 

18 

.6904 

.5975 

.5480 

.4716 

.3994 

.3148 

19 

.6737 

.5825 

.5333 

.4579 

.3895 

.3070 

20 

.6586 

.5684 

.5203 

.4451 

.3789 

.2977 

21 

.6455 

.5545 

.5078 

.4351 

.3688 

.2909 

22 

.6318 

.5426 

.4963 

.4241 

.3597 

.2829 

23 

.6186 

.5306 

.4852 

.4150 

.3518 

.2767 

24 

.6070 

.5200 

.4748 

.4061 

.3435 

.2704 

25 

.5962 

.5100 

.4654 

.3977 

.3362 

.2646 

26 

.5856 

.5002 

.4564 

.3894 

.3299 

.2588 

27 

.5757 

.4915 

.4481 

.3822 

.3236 

.2540 

28 

.5660 

.4828 

.4401 

.3749 

.3175 

.2490 

29 

.5567 

.4744 

.4320 

.3685 

.3113 

.2443 

30 

.5479 

.4665 

.4251 

.3620 

.3059 

.2400 


Note: The corresponding lower-tail critical value for/- s is — r*. 
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Appendix II 

Hypothetical Population 
of Employed Heads 
of Households 


VARIABLE 

1. Sex 

1 = male 

2 = female 

2. Marital status 

1 = single 

2 — married 

3 = widowed or divorced 

3. Age 

4. Occupation 

1 = professional 

2 = managerial 

3 = sales 

4 = clerical and technical 

5 = other 

5. Education (years of school completed) 

6. Commuting distance to work (miles) 

7. Number of years with current employer 

8. Annual income (thousands of dollars) 

9. Family size (number of persons) 

10. Size of residence (hundreds of square feet of floor space) 




Variable 


Subject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

1 

2 

29 

4 

14 

12 

5 

24 

4 

26 

2 

1 

2 

48 

1 

16 

34 

10 

27 

4 

22 

3 

1 

2 

41 

5 

12 

12 

11 

16 

4 

20 

4 

1 

2 

54 

4 

14 

18 

20 

23 

3 

20 

5 

1 

2 

44 

4 

12 

5 

20 

21 

3 

19 

6 

1 

2 

57 

3 

16 

30 

21 

44 

2 

36 

7 

1 

1 

45 

2 

16 

22 

22 

48 

1 

8 

8 

1 

2 

59 

2 

12 

20 

29 

32 

2 

25 

9 

1 

1 

18 

3 

12 

17 

1 

12 

1 

6 

10 

1 

2 

49 

3 

14 

24 

24 

37 

3 

32 

11 

1 

2 

43 

2 

16 

7 

18 

38 

3 

33 

12 

1 

2 

44 

4 

16 

14 

20 

31 

4 

30 

13 

1 

2 

43 

2 

12 

29 

15 

28 

4 

29 

14 

1 

2 

45 

5 

12 

28 

14 

17 

3 

18 

15 

1 

2 

50 

1 

16 

17 

23 

24 

3 

20 

16 

1 

1 

44 

4 

14 

28 

20 

22 

1 

19 

17 

1 

2 

52 

3 

12 

17 

21 

30 

3 

30 

18 

1 

2 

60 

2 

16 

22 

25 

60 

2 

39 

19 

1 

2 

43 

2 

16 

9 

12 

39 

4 

31 

20 

1 

2 

53 

4 

12 

26 

16 

18 

3 

17 

21 

1 

3 

45 

5 

16 

21 

20 

29 

1 

28 

22 

1 

2 

55 

3 

12 

5 

30 

27 

2 

20 

23 

1 

3 

20 

5 

14 

33 

2 

10 

1 

6 

24 

1 

2 

58 

4 

14 

32 

16 

24 

2 

19 

25 

1 

2 

43 

4 

12 

31 

21 

17 

4 

17 

26 

1 

3 

43 

3 

12 

34 

24 

24 

1 

26 

27 

2 

1 

21 

3 

12 

13 

2 

10 

1 

5 

28 

1 

2 

41 

2 

16 

31 

14 

38 

4 

32 

29 

1 

2 

44 

2 

16 

17 

12 

42 

3 

4 

30 

1 

2 

40 

4 

16 

17 

10 

29 

4 

28 

31 

1 

2 

56 

2 

12 

13 

16 

23 

2 

20 

32 

1 

2 

47 

3 

16 

29 

18 

41 

4 

34 

33 

1 

2 

51 

2 

12 

24 

16 

30 

3 

32 

34 

2 

3 

26 

2 

16 

17 

3 

16 

2 

24 

35 

2 

1 

18 

4 

12 

6 

1 

11 

1 

6 

36 

1 

2 

30 

2 

16 

5 

1 

31 

3 

27 

37 

1 

2 

49 

2 

12 

32 

9 

28 

3 

31 

38 

1 

2 

42 

4 

16 

33 

12 

24 

4 

20 

39 

1 

2 

50 

4 

14 

25 

14 

20 

3 

18 

40 

1 

2 

54 

3 

16 

5 

13 

37 

3 

34 

41 

1 

2 

45 

4 

14 

31 

23 

24 

4 

20 

42 

1 

2 

57 

2 

16 

11 

20 

60 

2 

38 

43 

2 

3 

25 

1 

17 

20 

3 

18 

2 

12 

44 

2 

1 

24 

1 

18 

10 

1 

19 

1 

5 

45 

1 

2 

59 

2 

16 

10 

19 

51 

2 

36 

46 

2 

1 

34 

1 

16 

13 

7 

14 

1 

7 

47 

1 

2 

43 

2 

12 

25 

16 

21 

4 

20 

48 

2 

3 

30 

2 

16 

14 

5 

12 

3 

12 

49 

1 

2 

45 

4 

16 

17 

15 

22 

3 

19 

50 

1 

2 

58 

3 

15 

33 

20 

31 

2 

33 


Variable 


ubject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

51 

2 

1 

25 

1 

16 

4 

3 

15 

1 

5 

52 

1 

2 

29 

3 

16 

32 

2 

34 

5 

30 

53 

1 

2 

52 

2 

14 

32 

7 

28 

3 

25 

54 

1 

2 

18 

4 

12 

23 

1 

10 

2 

4 

55 

1 

2 

53 

4 

14 

12 

11 

21 

2 

20 

56 

1 

2 

55 

3 

16 

27 

15 

41 

2 

34 

57 

2 

3 

35 

3 

13 

19 

10 

12 

2 

9 

58 

1 

2 

40 

2 

12 

16 

10 

26 

4 

27 

59 

2 

1 

42 

4 

16 

30 

6 

13 

1 

5 

60 

1 

2 

44 

2 

16 

32 

15 

32 

3 

34 

61 

1 

2 

48 

4 

16 

27 

21 

30 

3 

32 

62 

1 

1 

20 

5 

12 

28 

2 

12 

1 

6 

63 

1 

2 

42 

2 

16 

9 

12 

38 

4 

34 

64 

1 

2 

60 

3 

14 

8 

20 

27 

2 

28 

65 

1 

3 

21 

3 

12 

16 

1 

12 

1 

5 

66 

1 

2 

56 

2 

15 

32 

26 

29 

2 

27 

67 

1 

2 

41 

2 

16 

20 

15 

34 

4 

33 

68 

1 

2 

51 

4 

14 

30 

20 

21 

3 

19 

69 

1 

2 

44 

4 

12 

20 

12 

20 

4 

18 

70 

1 

2 

47 

5 

12 

29 

17 

16 

3 

17 

71 

1 

2 

31 

2 

13 

14 

6 

35 

4 

38 

72 

1 

2 

20 

4 

13 

35 

3 

11 

3 

6 

73 

1 

2 

50 

3 

12 

34 

22 

27 

4 

23 

74 

1 

2 

47 

4 

16 

16 

16 

25 

3 

22 

75 

1 

2 

54 

4 

14 

32 

28 

23 

3 

21 

76 

2 

1 

31 

5 

12 

7 

11 

11 

1 

6 

77 

1 

2 

57 

4 

14 

27 

22 

24 

2 

21 

78 

2 

3 

29 

1 

18 

21 

5 

20 

2 

24 

79 

1 

2 

59 

5 

12 

9 

31 

17 

2 

27 

80 

1 

2 

42 

3 

16 

28 

19 

41 

3 

38 

81 

1 

2 

42 

3 

16 

9 

14 

38 

4 

37 

82 

2 

2 

48 

5 

12 

9 

8 

12 

4 

10 

83 

2 

2 

49 

4 

12 

7 

14 

13 

5 

16 

84 

1 

2 

55 

2 

16 

8 

15 

44 

3 

36 

85 

1 

2 

49 

4 

12 

16 

10 

17 

4 

18 

86 

1 

1 

21 

5 

12 

32 

2 

12 

1 

7 

87 

1 

2 

30 

4 

14 

20 

3 

29 

4 

27 

88 

2 

1 

26 

1 

16 

29 

4 

17 

1 

70 

89 

2 

1 

35 

3 

14 

28 

12 

10 

1 

5 

90 

1 

2 

52 

4 

14 

30 

12 

19 

2 

18 

91 

1 

1 

18 

3 

12 

20 

1 

10 

1 

6 

92 

1 

2 

53 

5 

12 

29 

24 

15 

3 

16 

93 

1 

2 

43 

2 

16 

14 

21 

38 

4 

37 

94 

1 

2 

58 

3 

16 

35 

16 

37 

2 

39 

95 

2 

1 

44 

2 

18 

11 

20 

19 

1 

11 

96 

1 

2 

40 

2 

12 

34 

12 

27 

4 

34 

97 

1 

2 

41 

2 

16 

11 

11 

36 

3 

37 

98 

1 

2 

48 

4 

16 

34 

10 

28 

3 

27 

99 

1 

2 

40 

2 

14 

25 

12 

30 

4 

28 

100 

1 

1 

60 

4 

14 

10 

24 

25 

1 

21 





Variable 


Subject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

101 

2 

3 

31 

1 

18 

24 

7 

19 

3 

15 

102 

1 

2 

56 

3 

16 

20 

20 

42 

2 

38 

103 

1 

2 

51 

3 

12 

9 

21 

31 

3 

23 

104 

2 

1 

33 

1 

16 

28 

10 

20 

1 

23 

105 

1 

2 

29 

1 

18 

15 

5 

29 

3 

27 

106 

1 

2 

22 

1 

16 

29 

1 

22 

2 

12 

107 

1 

1 

29 

5 

12 

17 

4 

28 

1 

6 

108 

2 

1 

36 

2 

16 

25 

11 

17 

1 

12 

109 

1 

2 

32 

1 

16 

26 

2 

30 

4 

27 

110 

1 

2 

21 

4 

13 

10 

2 

11 

3 

7 

111 

1 

2 

48 

1 

16 

23 

14 

20 

4 

29 

112 

1 

2 

43 

2 

12 

29 

12 

24 

3 

30 

113 

1 

2 

50 

2 

12 

22 

30 

27 

3 

21 

114 

2 

1 

37 

3 

14 

26 

10 

14 

1 

8 

115 

1 

2 

54 

1 

16 

19 

26 

22 

2 

29 

116 

2 

2 

48 

4 

14 

23 

8 

13 

6 

11 

117 

1 

2 

57 

3 

16 

9 

20 

38 

2 

34 

118 

2 

1 

38 

5 

16 

28 

3 

12 

1 

7 

119 

1 

1 

59 

4 

14 

9 

32 

21 

1 

19 

120 

1 

2 

47 

2 

16 

31 

24 

32 

3 

33 

121 

1 

3 

58 

4 

16 

32 

20 

24 

2 

20 

122 

1 

2 

55 

1 

16 

34 

25 

29 

3 

22 

123 

1 

2 

52 

3 

14 

30 

22 

27 

4 

24 

124 

1 

2 

31 

1 

16 

33 

4 

32 

4 

27 

125 

2 

1 

43 

1 

16 

16 

3 

17 

1 

10 

126 

2 

1 

27 

4 

12 

15 

8 

10 

1 

5 

127 

1 

2 

53 

2 

12 

15 

18 

30 

2 

24 

128 

2 

3 

28 

5 

13 

5 

8 

12 

3 

18 

129 

1 

2 

40 

2 

12 

6 

15 

21 

4 

19 

130 

1 

2 

41 

4 

14 

17 

17 

19 

3 

16 

131 

1 

2 

42 

2 

16 

29 

14 

35 

4 

34 

132 

1 

1 

20 

5 

12 

19 

2 

14 

1 

8 

133 

1 

2 

43 

4 

12 

16 

18 

21 

3 

19 

134 

1 

2 

42 

3 

16 

12 

14 

40 

3 

34 

135 

1 

2 

60 

3 

12 

32 

27 

30 

2 

30 

136 

1 

2 

56 

2 

16 

15 

28 

38 

2 

33 

137 

1 

2 

18 

3 

12 

26 

1 

2 

3 

6 

138 

1 

2 

51 

2 

16 

30 

19 

44 

3 

37 

139 

1 

2 

49 

2 

12 

34 

10 

32 

5 

30 

140 

1 

2 

30 

3 

12 

14 

28 

4 

4 

27 

141 

1 

2 

30 

4 

14 

26 

5 

25 

5 

17 

142 

1 

2 

38 

3 

12 

5 

10 

38 

3 

35 

143 

1 

2 

49 

2 

14 

35 

11 

24 

3 

30 

144 

1 

3 

50 

2 

16 

22 

15 

38 

4 

33 

145 

2 

2 

47 

3 

14 

15 

20 

16 

2 

23 

146 

1 

2 

54 

3 

16 

9 

21 

41 

3 

35 

147 

1 

2 

43 

4 

16 

8 

20 

26 

4 

30 

148 

1 

1 

57 

3 

12 

34 

24 

27 

1 

31 

149 

1 

1 

18 

3 

12 

21 

1 

14 

1 

5 

150 

1 

2 

59 

2 

14 

17 

32 

28 

2 

30 



Variable 


Subject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

151 

1 

2 

40 

2 

12 

13 

20 

24 

5 

31 

152 

1 

2 

20 

5 

14 

14 

2 

16 

2 

5 

153 

1 

1 

49 

4 

16 

9 

22 

22 

1 

19 

154 

1 

3 

55 

5 

12 

20 

26 

18 

1 

5 

155 

1 

1 

22 

2 

16 

31 

2 

20 

1 

6 

156 

1 

1 

21 

4 

15 

9 

2 

15 

1 

6 

157 

1 

2 

29 

3 

15 

28 

1 

37 

6 

18 

158 

1 

1 

23 

2 

16 

11 

1 

20 

1 

7 

159 

1 

2 

32 

2 

16 

33 

1 

41 

5 

18 

160 

1 

1 

20 

5 

12 

30 

1 

11 

1 

5 

161 

1 

1 

52 

5 

12 

28 

27 

17 

1 

8 

162 

1 

3 

41 

4 

16 

13 

14 

20 

1 

6 

163 

1 

2 

53 

3 

16 

13 

21 

35 

3 

34 

164 

1 

2 

44 

5 

14 

33 

20 

18 

4 

20 

165 

1 

2 

58 

4 

14 

14 

30 

21 

2 

19 

166 

1 

1 

47 

2 

16 

32 

22 

33 

3 

32 

167 

1 

2 

60 

5 

12 

23 

28 

15 

2 

15 

168 

1 

2 

42 

4 

16 

9 

20 

22 

4 

20 

169 

2 

2 

47 

5 

12 

25 

22 

18 

4 

12 

170 

1 

2 

43 

3 

16 

11 

14 

42 

3 

33 

171 

1 

2 

48 

2 

16 

23 

17 

45 

2 

34 

172 
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13 

4 

10 

934 

1 

2 

43 

2 

16 

13 

14 

34 

4 

25 

935 

1 

2 

44 

3 

12 

33 

21 

29 

4 

19 

936 

1 

2 

37 

2 

12 

8 

15 

26 

5 

18 

937 

2 

3 

42 

2 

16 

15 

14 

25 

2 

24 

938 

1 

2 

36 

4 

12 

25 

16 

19 

5 

15 

939 

1 

2 

53 

3 

16 

20 

25 

33 

4 

20 

940 

1 

2 

39 

2 

16 

23 

16 

35 

4 

37 

941 

1 

3 

52 

2 

14 

24 

12 

26 

3 

19 

942 

2 

1 

36 

3 

14 

15 

7 

16 

1 

5 

943 

1 

2 

51 

4 

12 

34 

15 

24 

3 

20 

944 

1 

3 

27 

2 

16 

19 

3 

31 

1 

10 

945 

1 

3 

26 

5 

16 

5 

2 

27 

1 

5 

946 

1 

1 

19 

4 

12 

14 

1 

11 

1 

6 

947 

1 

1 

27 

3 

14 

34 

5 

30 

1 

9 

948 

1 

3 

49 

2 

16 

16 

17 

40 

1 

19 

949 

1 

2 

54 

2 

16 

17 

12 

45 

3 

24 

950 

2 

3 

33 

2 

18 

19 

4 

27 

3 

21 



Variable 


Subject 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

951 

1 

1 

57 

3 

12 

9 

17 

30 

1 

8 

952 

1 

2 

47 

4 

14 

33 

17 

26 

4 

20 

953 

1 

2 

59 

5 

12 

10 

29 

18 

2 

19 

954 

1 

2 

48 

2 

16 

22 

22 

37 

4 

21 

955 

1 

2 

60 

4 

12 

23 

24 

20 

2 

19 

956 

1 

2 

52 

3 

16 

18 

21 

32 

3 

30 

957 

1 

2 

40 

2 

16 

35 

14 

35 

4 

33 

958 

2 

1 

32 

1 

18 

15 

8 

23 

1 

6 

959 

1 

2 

58 

2 

12 

24 

30 

26 

2 

20 

960 

1 

2 

55 

4 

14 

23 

15 

21 

2 

18 

961 

2 

3 

33 

4 

12 

13 

11 

14 

2 

14 

962 

1 

2 

28 

1 

18 

6 

4 

27 

4 

17 

963 

1 

2 

43 

2 

12 

33 

13 

25 

4 

20 

964 

1 

2 

45 

2 

16 

7 

17 

35 

3 

32 

965 

1 

2 

49 

3 

16 

27 

12 

40 

4 

34 

966 

1 

2 

39 

1 

16 

21 

10 

28 

5 

18 

967 

1 

2 

50 

4 

16 

15 

21 

30 

3 

21 

968 

1 

2 

48 

3 

12 

35 

22 

24 

3 

19 

969 

2 

3 

39 

3 

12 

8 

14 

13 

4 

19 

970 

2 

3 

41 

4 

12 

8 

13 

14 

3 

15 

971 

2 

1 

31 

1 

18 

9 

4 

21 

1 

6 

972 

1 

2 

44 

2 

14 

12 

14 

26 

4 

20 

973 

2 

3 

36 

2 

16 

11 

5 

24 

2 

30 

974 

1 

2 

42 

2 

12 

11 

14 

25 

3 

19 

975 

1 

2 

40 

4 

12 

23 

15 

20 

4 

18 

976 

1 

2 

56 

2 

14 

16 

16 

27 

2 

20 

977 

1 

2 

39 

3 

12 

35 

19 

26 

4 

21 

978 

1 

2 

53 

3 

16 

29 

23 

45 

3 

37 

979 

2 

3 

38 

1 

16 

13 

6 

20 

2 

8 

980 

1 

2 

51 

4 

16 

23 

25 

33 

3 

24 

981 

1 

3 

47 

3 

16 

14 

17 

33 

1 

24 

982 

1 

1 

48 

2 

16 

22 

19 

37 

1 

25 

983 

1 

2 

49 

2 

12 

28 

20 

28 

3 

20 

984 

2 

3 

40 

2 

16 

20 

14 

22 

2 

18 

985 

1 

2 

54 

4 

16 

7 

24 

21 

3 

19 

986 

1 

2 

57 

3 

14 

33 

28 

30 

2 

22 

987 

2 

3 

35 

3 

14 

10 

7 

18 

1 

7 

988 

1 

2 

56 

3 

12 

25 

32 

26 

2 

19 

989 

1 

2 

55 

4 

16 

5 

24 

24 

2 

18 

990 

1 

2 

40 

3 

16 

23 

16 

32 

4 

23 

991 

1 

2 

53 

2 

16 

7 

20 

29 

3 

21 

992 

2 

3 

39 

4 

12 

8 

9 

13 

1 

6 

993 

2 

1 

22 

1 

16 

9 

1 

20 

1 

6 

994 

1 

2 

52 

2 

12 

20 

24 

21 

3 

20 

995 

2 

3 

24 

5 

12 

19 

1 

12 

2 

6 

996 

1 

2 

51 

2 

14 

5 

20 

20 

3 

19 

997 

1 

2 

50 

3 

12 

35 

19 

27 

3 

22 

998 

1 

3 

19 

5 

12 

15 

1 

10 

1 

4 

999 

1 

1 

28 

3 

16 

11 

1 

36 

1 

10 

1000 

2 

3 

32 

4 

12 

9 

3 

13 

2 

6 






Appendix III 

Summation Notation 


The symbol 2, which we use extensively in this text, is mathematical shorthand 
notation. We use it to indicate that the items following it are to be added. When 
necessary for clarity, we include an index of summation, usually /, as part of the 
notation. For example, 

4 

2 *.- 

i = 1 

instructs us to add the values of jc from x l through x 4 . That is, 

4 

2 */ = X, + x 2 + x 3 + x 4 

i= 1 

Similarly, 2x or 2 jc,- instructs us to add all values of x, where the meaning of 
“all” is apparent from the context. 

The following are some algebraic properties of summation that you will find 
useful. 

1. The summation of a constant c is n times the constant, when n is the number 
of values of the index of summation. That is, 

n 

= nc 

i= 1 


For example, 

4 

2 5 = 4(5) = 20 

i= 1 

2. The summation of a constant times a variable is equal to the constant times 
the summation of the variable. That is, given that c is a constant, then 

n n 

YjCX, = C YjXi 

i =1 i=l 

For example, 

2 50:,) = 5 2 0 

1=1 i =1 


675 



If x l = 2, x 2 = 3, x 3 = 6, and x 4 = 10, we have 

5(2 + 3 + 6 + 10) = 5(21) = 105 

3. The summation of a sum (or difference) is the sum (or difference) of the 
individual sums. In symbols, 

n n n 

2 (*.• ± yd = 2U- ± 

1=1 1=1 1=1 

This last property extends to more than two components. For example, 

n n n ti 

Ec*v + y,- - z d = 2*.- + 2^- - 2) z ; 

i=l i=l i=l i =1 

Double Subscript Notation In some cases a population may be composed of two 
or more identifiable groups or subpopulations. It is frequently convenient in such 
cases to distinguish one observation from another and also to identify the sub¬ 
population to which each observation belongs. This can be accomplished by the 
use of a double subscript on each observation. For example, consider a population 
consisting of four groups or subpopulations, each containing three observations, 
as shown in the following table. 

Group 


12 3 4 


10 8 

15 9 

25 14 

50 31 


2 8 

6 11 

1 _4 

9 23 


We refer to the first observation in group 2 as jc 12 , and we may write jc 12 - 8. 
The second observation in group 4 is designated x 24 , and we may write x 24 = 
11, and so on. 

The total of a given group is obtained by adding the observations in that group 
as shown in the table. The total for the entire population of 12 observations is 
obtained by adding the group totals. For the population shown in the table the 
population total is 50 + 31 + 9 + 23 = 113. This system of notation and 
summation may be generalized for the case of k groups as follows. 

Xfj = the ith observation in the jth group, where j identifies the group and 
/ distinguishes one observation from another within the group 


2 Xfj = the total for the y'th group 

/= i 

k fij 

2 2 x u ~ the grand total of all observations 

7=1 /=1 



GREEK ALPHABET 


a 

Alpha 

i 

Iota 

P 

Rho 

P 

Beta 

K 

Kappa 


Sigma 

7 

Gamma 

X 

Lambda 

T 

Tau 

5 

Delta 

V 

Mu 

V 

Upsilon 

e 

Epsilon 

V 

Nu 

<t> 

Phi 

c 

Zeta 

€ 

Xi 

X 

Chi 

V 

Eta 

o 

Omicron 

4< 

Psi 

6 

Theta 

77 

Pi 

to 

Omega 




Review Questions 


Answers to Odd-Numbered 
Exercises 


Note: Many of the following answers were obtained by computer and, consequently, may 
differ from answers obtained by hand calculations because of rounding. 


CHAPTER 2 

2.5.1 Suggested class intervals: 10-19, 20-29, 30-39, etc. Frequencies: 5, 14, 27, 26, 
16, 10, 5, 4, 3 

2.5.3 Suggested Class Intervals: 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. Fre¬ 
quencies: 7, 10, 28, 33, 16, 6 


2.5.5 

Class 

Cumulative 

Relative 

Cumulative 

relative 

interval Frequency frequency 

frequency 

frequency 

10-14 

23 23 

0.168 

0.168 

15-19 

29 52 

0.212 

0.380 

20-24 

47 99 

0.343 

0.723 

25-29 

24 123 

0.175 

0.898 

30-34 

9 132 

0.066 

0.964 

35-39 

3 135 

0.022 

0.986 

40-44 

2 137 

0.014 

1.000 


137 

1.000 


2.6.1 (a) 27.5 (b) 28.5 (c) 30 (d) 20 (e) 38.28 (f) 6.2 


2.6.3 (a) 57.1 (b) 56 (c) 56 and 59 (d) 10 

(e) 9.0 (f) 3.0 


2.6.,5 (a) 75 (b) 50- 

-150 



2.7.1 3c = 45.59; 

Median = 42.96; s 2 ■ 

= 336.4137; 5 = 

18.34 

2.7.3 x = 70.4; Median = 71.0; s 2 = 

155.75; = 12.5 


2.7.5 3c = 274.1875 

s 2 = 10613.11 5 = 

= 103.02 Medial) 

= 261.50 

P 25 = 191.5 

P 75 = 352.44 P 95 = 

= 459.5 


9. (b) Relative frequency 

(c) Cumulative relative frequency 

Magazine A 

Magazine B 

Magazine A Magazine B 

0.05 

0.04 

0.05 

0.04 

0.24 

0.09 

0.29 

0.13 

0.29 

0.09 

0.58 

0.22 

0.19 

0.14 

0.77 

0.36 

0.14 

0.27 

0.91 

0.63 

0.05 

0.23 

0.96 

0.86 

0.04 

0.14 

1.00 

1.00 



(d) Magazine A, because it has a greater proportion of subscribers in the age groups 
most likely to have babies 

(e) Magazine B, because it has a greater proportion of subscribers at or nearing 
retirement age 

11. 3c = 4.97 s 2 = 2.7446 s = 1.66 Median = 5 

13. x = 15.2 Median = 14.5 r = 6.6222 5 = 2.57 

15. 3c = 2.9 Median = 3.15 r = 0.88889 s = 0.94281 

17. 3c = 2.28 Median = 2 r = 4.0433 s = 2.0108 

19. x = 1.204 Median = 1.2 s 2 = 0.0529 5 = 0.23 

21. 3c = 39.2 Median - 42 .? 2 = 47.314 j = 6.8785 

23. x = 365 Median = 348.5 s 2 - 13,754 s = 117.28 

25. x = 7.65625 Median - 7.15 .v 2 = 2.212 s = 1.4873 

27. (a) 

Community A 


Salaries 

($1000) 

Frequency 

Cumulative 

frequency 

Relative 

frequency 

Cumulative 

relative 

frequency 

10-19 

43 

43 

0.4095 

0.4095 

20-29 

15 

58 

0.1429 

0.5524 

30-39 

13 

71 

0.1238 

0.6762 

40-49 

12 

83 

0.1143 

0.7905 

50-59 

11 

94 

0.1048 

0.8953 

60-69 

9 

103 

0.0857 

0.9810 

70-79 

2 

105 

0.0190 

1.0000 

Median = 19.5 -+• 

(9.5/15)00) = 25.83 






Community B 







Cumulative 

Salaries 


Cumulative 

Relative 

relative 

($1000) 

Frequency 

frequency 

frequency 

frequency 

10-19 

10 

10 

0.0917 

0.0917 

20-29 

11 

21 

0.1009 

0.1926 

30-39 

13 

34 

0.1193 

0.3119 

40-49 

16 

50 

0.1468 

0.4587 

50-59 

18 

68 

0.1651 

0.6238 

60-69 

16 

84 

0.1468 

0.7706 

70-79 

23 

107 

0.2110 

0.9816 

80-89 

0 

107 

0.0000 

0.9816 

90-99 

2 

109 

0.0183 

0.9999 


Median - 49.5 + (4.5/18)(10) = 52 


(b) Community B. A greater percentage of the heads of households have annual sal¬ 
aries equal to or greater than $50,000. 

(c) Median for both cases, because both distributions are skewed. 

(d) ai = 329.7596 = 411.3964 


CHAPTER 3 

3.2.1 (a) 5 (b) 0 (c) 2 (d) 44 (e) 89 (f) 125 (g) 190 '(h) 45 (i) 12 (j) 218 
(k)0 (1)18 (m) 80 (n) 100 (o) 80 (p) 18 

3.2.3 {0, 1, 2, 3, 4, 5}, {3} 

3.3.1 (a) 56 (b) 60 (c) 362,880 (d) 5040 (e) 30 (f) 120 (g) 120 (h) 35 (i) 5 
(j) 70 

3.3.3 35 

3.3.5 (a) 120 (b) 60 
3.3.7 (a) 720 (b) 360 

3.6.1 (a) (i) 0.10 (ii) 0.142 (iii) 0.015 (iv) 0.106 (v) 0.3500 (vi) 0.227 

(b) (i) 0.075 (ii) 0.165 (iii) 0.010 (iv) 0.010 (v) 0.230 (vi) 0.438 (vii) 0.600 

(c) No. P( A, fl B^ ( / 3 (A 1 )P(B 1 ), for example. 

3.6.3 (a) 0.33 (b)0.17 (c) 0.50 

3.6.5 (a) 5/231 (b) 0 (c) 2/231 (d) 44/231 (e) 89/231 (f) 125/231 (g) 190/231 
(h) 45/231 (i) 12/231 

3.7.1 (a) 0.427 (b) 0.257 (c) 0.197 (d) 0.120 

3.7.3 (a) 0.2490 (b) 0.1120 (c) 0.4149 (d) 0.2241 


Review Questions 


11. 4 

13. 15,504 

15. (a) 0.021 (b) 0.040 (c) 0.004 


17. (a) Employees who voted in favor of the plan or who have children in school or both 

(b) Employees who voted in favor of the plan and have children in school 

(c) Employees who did not vote in favor of the plan 

(d) Employees who do not have children in school 


19. (a) 0 


21. 

1 12 1 

\ 4 / 

23. 

12,600 

25. 

(a) 1. 0. 


(b) 1. 0 

27. 

0.0364 


29. 


(b) A 
12! 
4!8! 


(c)U 

12 


(d) A 
11 • 10 • 9 


8 ! 


4 • 3 • 2 • 8! 


495 


2. 0.27, 0.13 3. 0.25 (for each area) 4. 0.92 5. 0.35 
2. 0.62 3. 0.75 4. 0.48 5. 0.08 6. 0.60 


0.4545 
0.5091 
1.0000 
0.45 


31. 

33. 


35. 


37 . 


(a) 15; M,M 2 , M { F 2 , M 1 F 3 , M 1 F 4 , M 2 F„ M 2 F 2 , M 2 F 3 , M 2 F 4 , F,F 2 , F^, 

F,F 4 , F 2 F 3 , F 2 F 4 , F 3 F 4 (b) 0.9333 (c) 0.5333 (d) 0.6000 (e) 0.0667 

(a) 0.06 (b) 0 (c) 0.50 

(a) 1/504 (b) 16/21 

(a) 0.60 (b) 0.75 (c) 0.25 (d) 0.60 



CHAPTER 4 


4.2.1 (a) P(X = Xi ): 0.05, 0.08, 0.10, 0.12, 0.18, 0.14, 0.10, 0.09, 0.08, 0.04, 0.02 
(b) P(X < Xj): 0.05, 0.13, 0.23, 0.35, 0.53, 0.67, 0.77, 0.86, 0.94, 0.98, 1.00. 

4.2.3 (a) 0.05 (b) 0.02 (c) 0.33 (d) 0.67 (e) 0.63 

4.3.1 (a) 0.3125 (b) 0.5 (c) 0.5 (d) 0.0312 (e) 0.0312 

4.3.3 (a) 0.0170 (b) 0.9984 (c) 0.3430 

4.3.5 Yes 

4.3.7 (a) 0.5846 (b) 0.0338 (c) 0.5846 (d) 0.9906 

4.4.1 (a) 0.923 (b) 0.077 (c) 0.003 

4.4.3 Binomial: (a) 0.0054 (b) 0.0380. Poisson: (a) 0.006 (b) 0.041 

4.4.5 (a) 0.073 (b) 0.004 (c) 0.294 

4.5.1 0.4286 

4.5.3 (a) 0.00476 (b) 0.5429 (c) 0.9286 

4.5.5 (a) 0.4762 (b) 0.9523 (c) 0.5952 


4.7.1 

0.4382 

4.7.3 

0.2578 

4.7.5 

0.0099 

4.7.7 

0.9500 

4.7.9 

0.8934 

4.7.11 

-2.54 

4.7.13 

1.77 

4.7.15 

1.32 

4.7.17 

0.9876 

4.7.19 

0.0505 

4.7.21 

(a) 0.0228 (b) 0.9525 (c) 0.0099 

4.7.23 

0.9676 


Review Questions 


17. 

19. 

21 . 

23. 

27. 

31. 

35. 

39. 

43. 

47. 


49. 

51. 

53. 

55. 

57. 

59. 


(a) P(X = x^: 0.10, 0.15, 0.20, 0.25, 0.15, 0.05, 0.03, 0.02, 0.02, 0.02, 0.01 

(c) P(X < x^: 0.10, 0.25, 0.45, 0.70, 0.85, 0.90, 0.93, 0.95, 0.97, 0.99, 1.00. 

(a) 0.0480 (b) 0.0171 (c) 0.7454 (d) 0.0003 

(a) 0.380 (b) 0.620 (c) 0.895 (d) 0.105 (e) 0.041 (f) 0.959 

0.9010 25. 3.33 

265.30 29. (a) 0.0013 (b) 0.0062 (c) 0.7745 (d) 728 (e) 2.28% 

0.1320 33. (a) 0.1935 (b) 0.0018 (c) 0.0018 

0.2131 37. 0.8414 

(a) 0.0228 41. (a) 64.75 (b) k = 118.45 (c) 130.15 (d) Ik = 131.80, 

(b) 309 k' = 68.20 

14.90 45. 10.6 

(a) 2, 1.8 (b) 4, 3.2 (c) 6, 4.2 (d) 8, 4.8 (e) 10, 5 (f) 12, 4.8 (g) 14, 4.2 
(h) 16, 3.2 (i) 18, 1.8 cr largest for/; = 0.5 cr 2 smallest for p — 0.1 and 
p = 0.9 

(a) 0.875 (b) 0.492 (c) 0.265 

(a) 0.149 (b) 0.050 (c) 0.801 (d) 0.185 

(a) 0.933 (b) 0.049 (c) 0.922 (d) 0.220 (c) 0.697 

0.133 

(a) 0.0985 (b) 0.5645 (c) 0.0011 (d) 0.0702 
0.0624 



CHAPTER 5 


Review Questions 


Review Questions 


5.4.1 (a) 99.8, 0.548 (b) 0.5554 

5.4.7 0.0174 

5.5.3 0.4778 

5.6.3 0.4658 

5.7.3 0.015 


5.4.3 

0.9237 

5.5.1 

0.0071 

5.6.1 

0.8413 

5.7.1 

(a) 0.07, 0.05 (b) 0.3811 


(c) 0.1587 


13. 

(a) 32, 1.5 (b) 0.8854 

(c) 0.0228 (d) 0.0918 (f) random 

15. 

0.0475 

17. 

0.0351 

19. 

0.0089 

21. 

0.0212 

23. 

0.8764, 0.0618 

25. 

0.0668 

27. 

0.2005 

29. 

0.0475 

33. 

0.0034 




CHAPTER 6 


6.3,1 

8.6, 9.0 

6.3.3 

63, 67 

6.3.5 

18.67, 28.67 

6.4.1 

312, 412 

6.4.3 

25, 27 

6.5.1 

52, 68 

6.5.3 

0.23, 1.27 

6.5.5 

1.54, 2.78 

6.6.1 

(a) s 2 = 229.4; 5.6, 24.6 (b) df « 42; 5.7, 24.5 

6.6.3 

(a) s 2 p = 0.019, -0.17, 0.05 

(b) df ~ 23; 

-0.17, 0.05 

6.7.1 

0.45, 0.55 

6.7.3 

0.64, 0.76 

6.8.1 

0.14, 0.26 

6.8.3 

-0.02, 0.16 

6.9.1 

82 

6.9.3 

157 

6.9.5 

157 

6.10.1 

385 

6.10.3 

218 

6.10.5 

162 

6.11.1 

1.89, 13.33. The sample was 

randomly selected from a normally distributed 


population. 



6.11.3 

19.45 < (t 2 < 90.26 



6.12.1 

0.52, 5.15 



6.12.3 

1.6086 < (t 2 /(tI < 10.5978 




15. 

2.482, 2.498 

19. 

0.02, 0.08 

21. 

0.02, 0.08 

23. 

12, 18 

25. 

0.07, 0.23 

27. 

0.53, 0.67 

29. 

5.1, 7.6 

31. 

0.36, 0.44 

33. 

0.82, 0.92 

35. 

$34, $50 



37. 

0.45 < (t\/(t 2 < 4.23 

39. 

171 

41. 

7.8, 20.2 

43. 

1476 

45. 

0.004, 0.216 

47. 

2.20 < o- 2 < 8.11 

49. 

12.706, 23.161 

51. 

167.559, 182.041 

53. 

0.0724, 0.2610; No 

55. 

32, 36 

57. 

943 



Statistics at Work, sj = 0.571212; 0.01, 

1.19 



CHAPTER 7 

7.3.1 Reject null hypothesis, z = -3.00; p = 0.0026 

7.3.3 Since 3.00 > 1.96, reject H 0 , p = 0.0026 

7.4.1 Yes, t = 2.50; 0.01 <p< 0.025 

7.4.3 t = 3.115;p < 0.005 

7.4.5 Yes; t = 2.264; 0.025 > p > 0.01 

7.5.1 z - -2.00; p = 0.0456 

7.5.3 z = 2; p = 0.228 

7.5.5 Yes; z = -2.347; p « 0.0094 

7.6.1 (a)s 2 = 11.25; No, t = -2.79; 0.01 > p > 0.005 
(b) df « 19, t = -2.73; 0.01 > p > 0.005 

7.6.3 t = 5; p < 0.005 

7.6.5 No; r = 0.1880; p > 0.10 

7.7.1 Yes, z = 6.35; p < 0.001 

7.7.3 Yes, z = -4.00; p < 0.001 

7.7.5 Yes; z = -8.5888; p < 0.0002 

7.8.1 No, z = — 1.31; p = 0.0951 

7.8.3 Yes, z = 2.50; p = 0.0062 

7.9.1 No, z = —1.11; p = 0.2670 
- 7.9.3 z = 0.88; p = 0.1894 

7.10.1 Cannot reject H 0 , X 2 = 13.33; p > 0.10 (one-sided test) 

7.10.3 No, X 2 = 33.6; 0.10 > p > 0.05 

7.11.1 Yes, F = 3.00; 0.01 < p < 0.025 

7.11.3 F - 1.59, cannot reject 7/ 0 ; p > 0.10 (two-sided test) 

7.12.1 


Value of 


Alternative 
value of u 

P 

power funct 
1 - P 

516 

0.9500 

0.0500 

521 

0.8461 

0.1539 

528 

0.5596 

0.4404 

533 

0.3156 

0.6844 

539 

0.1093 

0.8907 

544 

0.0314 

0.9686 

547 

0.0129 

0.9871 


7.12.3 Value of 


Alternative 
value of n 

a 

power funct 
1 - 0 

4.25 

0.9900 

0.0100 

4.50 

0.8599 

0.1401 

4.75 

0.4325 

0.5675 

5.00 

0.0778 

0.9222 

5.25 

0.0038 

0.9962 


7.13.1 n = 548; C - 518.25 Select a sample of size 548 and computer. If I > 518.25, 
reject H 0 . If x < 518.25, do not reject H 0 . 

7.13.3 n = 103; C = 4.66 Select a sample of size 103 and compute x. If x > 4.66, 
reject H 0 . If x < 4.66, do not reject //„. 


Review Questions 


17. Yes, z = -1.67; p = 0.0475 

19. Yes, / = -2.50; 0.025 > p > 0.01 

21. Yes, z = -3.00; p = 0.0013 

23. Yes, t = 4.84. The samples were randomly and independently drawn from 

normally distributed populations, p < 0.005 
27. No, z — 1.00; p = 0.1587 

29. H 0 : p A < p B ; H { \ p A > p B ; No. p = 0.504, z = 0.62, p = 0.2676 

31. No, F = 2.03; 0.025 < p < 0.05 

33. z = -2.5; p = 0.0062 

37. z = 2.67; p = 0.0038 

39. t = 1.421; 0.10 > p > 0.05 

43. / = -2.763; 0.01 > p > 0.005 

45. z = 1.72; p = 0.0427 

47. Since z = 3 > 1.645, we reject H 0 . p = 0.0013 

49. x 2 ~ !0(1225)/625 = 19.6 < 20.483, and H 0 cannot be rejected. 

0.05 <p < 0.10 

51. Assume that the two groups are independent random samples from normally distrib¬ 
uted populations. Since F = 81/36 = 2.25 > 1.89, we reject H 0 . 

0.01 <p < 0.025. 

53. No; t = - 1.3474; 0.20 > p > 0.10 

55, Yes; t = -3.5841; p < 0.01 

57. No; z = - 1.7375; p = 0.0818 

59. No;z_=1.02; p = 0.1539 

61. Yes; d = -5.5, s d = 4.58, 1 = -4.160, p < 0.005 

63. Yes; t = 4.704 

65. No; t — 1.337; p = 0.0901 

Statistics at Work Popular Record Marketing: t = 8.18 (authors’ results), p < 0.01, 
reject H () \ ji s = /v. 

Raising Finance: d = —0.0767, sf, — 0.1102, t = —0.8003, fail to reject H 0 \ d = 
4.8650, sj = 0.2187, f = 36.0370, reject // 0 ; d = 5.4733, sj = 0.3362, t = 32.6994, 
reject 7/ 0 . 





CHAPTER 8 


Review Questions 


8.2.3 Yes, F 
8.3.1 


94.04; p < 0.005 
F = 10.24; p < 0.005 



Table of differences between means 


x B x c 



x 8 = 2.44 

— 0 

4.12* # 

4.34** 

x c = 2.44 

— 

4.12** 

4.34** 

x A = 6.56 


— 

0.22 

x D = 6.78 



— 


Table of differences between means 






*a = 1714 

— 2.15 

6.29** 


x B = 19.29 

— 

4.14** 


x c = 23.43 


— 


No, F = 3.01; 

0.10 > p > 0.05 



Yes, F = 9.12 

; p < 0.005 



Yes, F = 59.75; p < 0.005 



Yes, F = 171.86; 0.005 < p < 0.01 



F (pressure) = 

0.40, F(temp.) = 6.23, 

F (interaction) = 

35.89 

p(A) > 0.10; 

p(B) < 0.005; p(AB) < 0.005 


F (competitive activity) = 1.54, ^(geo. reg.) = 8.29, F (interaction) = 

p( A) > 0.10, 

p( B) < 0.005; p(AB) < 0.005 


Since 22.79 > 5.49, reject H 0 . p < 0.005 



No, F = 2.77; 

p > 0.10 




Table of differences between means 



*A *a 

X D x c 

*E 

x A = 23.0 

— 2.8 

6.1* 6.4* 

19.6* 

Xg = 25.8 

— 

3.3 3.6 

16.8* 

x 0 = 29.1 


— 0.3 

13.5* 

x c = 29.4 


— 

13.2* 


x E = 42.6 


Table of differences between means 


27. Since 9.77 > 5.14, reject H 0 . 0.01 < p < 0.025 
29. Since 21.41 > 3.26, reject H 0 . p < 0.005 
31. F a = 6.33 p < 0.005 F B = 38.87 p < 0.005 
F ab = 4.97 0.01 > p> 0.005 


Review Questions 


33. F = 8.30 

35. F = 18.82; p < 0.005 

37. F = 59.13; p < 0.005 

39. F = 9.83; p < 0.005 

Statistics at Work. TV Commercials: F = 10.47 (authors’ results), HSD = 0.656401; 
differences between means 1 and 2 and 1 and 3 are significant. 

Job Training and Worker Satisfaction: F = 15.02, p < 0.01 (authors’ results), between 
df == 1, within df = 2230. 


CHAPTER 9 


9.4.1 

y = -0.1329 + 

oo 

\D 

O 

o 







9.4.3 

f - 78.2210 + 

1.5161X 







9.5.1 

(a) r 2 - 0.92 
(e) 0.05, 0.07 

(b) F = 

145.8; p 

< 0.005 

(c) t = 

12.12; 

P 

< 0.01 

9.5.3 

(a) r 2 = 0.90 

(b) F = 

122.89; p 

< 0.005 

(c) t = 

11.08; 

P 

< 0.01 


(e) 1.22, 1.81 

9.6.1 (a) 2.96 ± (2.1604)(1.16447)(0.3182) (b) 2.96 ± (2.1604)(1.16447)( 1.0494) 
9.6.3 (a) 169.19 ± (2.1604)(7.1924)(0.2947) (b) 169.19 ± (2.1604)(7.1924)(1.0425) 

9.6.5 Confidence Limits for p /(x 


X 

Lower 

Upper 

20 

9.06 

12.52 

30 

15.25 

18.12 

40 

21.38 

23.77 

50 

27.42 

29.52 

60 

33.34 

35.39 

70 

39.11 

41.40 

80 

44.79 

47.51 

25 

12.16 

15.31 

90 

50.41 

53.68 

100 

55.99 

59.88 

9 = - 

0.9954 + 0.5893x 



9.8.1 (b) Reject H 0 , since t = 5.53 > 2.306; p < 0.01 (c) z = - 1.08, p — 0.1401 
9.8.3 Reject H 0 ; p = 0, since t = 6.39 > 2.306; p < 0.01 (c) z = -0.80, 
p m 0.2119 

15. (a) y = 54.61229 4- 0.394782a: (b) r = 0.2151 (c) t = 0.934 (d) p > 0.20 

17. (a) *y = 22.6046 4- 12.6714a 

(b) 

Source SS df MS F 


Regression 18,535.15 1 18,535.15 78.306 

Error 3,313.85 14 236.70 


Total 


21,849.00 


15 





Review Questions 


(c)p< 0.005 (d) y = $111.30 

19. (b) $ = 0.78336 + 0.5682lx (c) t = 18.148, p < 0.005 (d) 0.962 
(e) 22.69, 35.70 

23. (a) y = 4.78014 + 0.0897922* 

(b) 

Source SS df MS F 


Regression 40.4723 1 40.4723 10.264 

Error 51.2613 13 3.9432 

Total 91.7336 14 

(c) Since 10.264 > 4.67, reject H 0 . 

Statistics at Work. Is Ignorance Bliss? p < 0.01 (authors’ results), df = 2648, t = 
- 3.3001, reject H 0 . 

Poisoning Livestock: y = 68.12, t = 9.2736, p < 0.01. 


CHAPTER 10 

10.3.1 y = -30.5761 + 1.0406*, + 0.8390* 2y 

10.3.3 y = -3.8162 + 68.9486*, + 0.2457*^ 

10.4.1 (a) 0.98 (b) F = 173.77; p< 0.005 (c) t{b x ) = 5.46; p < 0.01; t(b 2 ) = 6.36; 
p < 0.01 (d) 1.0406 ± (2.3646)(0.1905); 0.8390 ± (2.3646)(0.1319) 

10.4.3 (a) 0.98 (b) F = 146.48; p < 0.005 (c) X^i) = 4.02; p < 0.01; t(b 2 ) = 1.31; 
p > 0.20 (d) 68.9486 ± (2.3646)(17.1460); 0.25 ± (2.3646)(0.1878) 

10.5.1 52.9979 + (2.3646)(1.47490) 1/2 [(1/10) + 0.0246(-2.7) 2 + 0.0118(-7.5) 2 + 
2(-0.0136)(-2.7)(-7.5)] ,/2 . For prediction interval add 1 under the last radi¬ 
cal. 

10.5.3 24.7155 ± (2.3646)(3.39) 1 /2 [( 1 /10) + 86.7209(-0.13) 2 + 0.0104(-0.9) 2 + 
2(-0.9015)(-0.13)(-0.9)] 1/2 . For prediction interval add 1 under the last rad¬ 
ical. 

10.6.1 (a) 0.9796 (b) F = 83.35; p < 0.005 (c) r yl 2 = 0.9518, r v2>1 = 0.1685, 
r, 2my = 0.0658 (d) t = 8.21; p < 0.01 

10.6.3 (a) 0.8577 (b) F = 11.13; p < 0.005 (c) r yU2 = -0.8570, r y2A = 0.4405, 
r l2 y = 0.4639 (d) t = -4.70; p < 0.01 

9. (‘d)y = 11.43 + 1.26*, + 3.11* 2 (b) R* 12 = 0.92 

Source SS df MS F 


Regression 1827.0046 2 913.50 69.048 

Residual 158.7286 12 13.23 

Total 1985.7333 14 


p < 0.005 

11. (a) y = 2.08 4- 0.06* y + 1.05x 2; (b) R 2 y 12 = 0.8506; F = 34.1512; p < 0.005 
(c) f, = 0.023; p > 2(0.10) = 0.20, do not reject H 0 . t 2 = 3.221; 


p < 2(0.005) = 0.01, reject H 0 . (d) 0.34, 1.76 (e) y = 28.45 (f) 13.44, 43.46 
(g) 21.26, 35.64 

13. (a) y = 117.03 - 5.32^ + 0.11 18jc 2 ; R* 12 = 0.8635; F = 37.95; p < 0.005 

f] = 6.03; p < 0.01, reject H 0 . t 2 = 0.27, p > 0.20, do not reject H 0 . (b) 99.377, 
117.875 (c) 104.626, 112.626 

15. (a) R = 0.9976; F = 933; p < 0.005 (b) r yl 2 = 0.505; r y2J = 0.084; r n v = 

0.902 (c) t, = 1.755; 0.20 > p > 0.10 t 2 = -0.253; /? > 0.20 t 3 = 6.268; 

p < 0.01 

Statistics at Work. All coefficients significant at 0.05 level except EMP (authors’ re¬ 
sults). 


CHAPTER 11 


11.3.1 X 2 = 22.94; p < 0.005 

11 . 3.3 X 2 = 7.79; p>0.10 

11 . 3.5 X 2 = 78.0001; Critical xl = 12.592. Reject H 0 ; p < 0.005 

11 . 3.7 X 2 = 111.67; p < 0.005 

11 . 4.1 X 2 = 206.45; p < 0.005 Reject T/ 0 . 

11 . 4.3 X 2 = 1.37883; p > 0.10 Do not reject H 0 . 

11 . 4.5 X 2 = 17.8854; 0.01 > p > 0.005 Reject H 0 . 

11 . 5.1 X 2 = 46.647; p < 0.005 Reject H 0 . 


Review Questions 


9. X 2 = 2.7019; p>0.10 
11. X 2 = 2.64545; p > 0.10 Do not reject H 0 . 

13. X 2 = 9.14807; 0.10 > p > 0.05 Do not reject H 0 . 

15. X 2 = 18.5; p < 0.005 Reject H 0 . 

17. X 2 = 1.9106 Do not reject H 0 . p > 0.10 
19. X 2 = 76.3906 Reject H 0 . p < 0.005 
21. X 2 = 2.577; p>0.10 Do not reject H 0 . 

23. X 2 = 71.4431; p < 0.005 Reject H 0 . 

25. X 2 = 33.7143; p < 0.005 Reject H 0 . 

27. X 2 = 38.4759; p < 0.005 Reject H 0 . 

29. X 2 = 206.45; p < 0.005 
31. X 2 = 22.0; p < 0.005 

Statistics at Work. Shoplifting: df = 6, significant at 0.00001 level (authors’ results), 
reject H 0 . Popular Music Artists: df = 6,P< 0 .0001 (authors’ results), reject H 0 . Alter¬ 
native Heat Sources: X 2 = 4.20, p < 0.05 (authors’ results), reject H 0 . 


CHAPTER 12 

12.5.1 r = 6, not significant 

12.5.3 Since 2 < 7 < 9, H 0 cannot be rejected. 

12.6.1 Do not reject H 0 . p > 0.054 



12 . 6.3 Reject H 0 . 0.023 > p > 0.005 

12 . 7.1 Since 178.5 > 160, reject// 0 . 0.01 > p > 0.002 

12 . 7.3 Since 272 > 227 >128, do not reject H 0 . p > 0.20 

12 . 8.1 P(k > 9] 10, 0.5) = 0.0107 p = 0.0107; Reject H 0 . 

12 . 8.3 P(k > 10| 12, 0.5) = 0.0193; Reject H 0 . p = 0.0193 

12 . 9.1 H = 3.5 cannot reject H 0 . p > 0.102 

12 . 9.3 H = 24.425; H Q can be rejected at 0.01 level, p < 0.005 

12 . 9.5 H = 9.69; p < 0.01; Reject H 0 . 

12 . 10.1 x 2 r ~ 3.08 Cannot reject// 0 . p > 0.10 

12 . 10.3 x 2 = 10 .4 ; Reject H 0 . 0.01 > p > 0.005 

12 . 11.1 r s = 0.703 p < 0.001 

12 . 11.3 r s = 0.864; p< 0.001 (one-sided test) Reject H 0 . 

12 . 11.5 r s = 0.7455; 0.005 <p < 0.010 

12 . 12.1 r s = 0.6676; 0.01 >p> 0.002 

Review Questions 7. Since 6 < 12 < 16, do not reject H 0 . p > 0.05 

9 . r s — 0.918; p < 0.001 (one-sided test); Reject H 0 . 

11 . r s = 0.7804; Since 0.7804 > 0.7464, p < 2(0.001) - 0.002. Reject H 0 . 

13 . H = 4.46; Do not reject H Q . p > 0.102 

15 . xl = 27 - 76 ; Reject H 0 . p < 0.005 

17 . H = 7.549; p < 0.01 

19 . x 2 r = 7 -143; p = 0.027 

21. r + = 34.5; T_ = 1.5; 0.008 > p > 0.004 

23 . T + = 49.5; = 28.5; p > 0.110 

Statistics at Work. Magazine Ads and the Fog Index: df = 8, p > 0.10. Business 
Ethics: r s — 0.96, p < 0.01. 


CHAPTER 13 

13.2.1 (b) y r = 265.33 - 12.77a- (c) 355, 342, 329, 316, 304, 291, 278, 265, 253, 
240, 227, 214, 201, 189, 176 

13.2.3 (b) y t = 618.86 - 8.16a (c) 725, 709, 692, 676, 660, 643, 627, 611, 594, 
578, 562, 545, 529, 513 

13.3.1 352, 302, 242, 250, 197, 209, 194, 249, 261, 245, 234 

13.3.3 719, 696, 663, 643, 628, 628, 614, 595, 578, 559, 545, 534 

13.4.1 (b) 59.34, 20.02, 193.53, 101.17, 192.10, 292.20, 93.90, 49.33, 94.62, 37.18, 
23.83, 42.78 (c) 7, 60, 3, 10, 9, 11, 26, 18, 11, 13, 71, 2 

13.4.3 Estimated number of pairs: 9, 3, 29, 15, 29, 44, 14, 7, 14, 6, 4, 6 

13.5.1 (b) 117.9, 137.9, 139.8, 96.6, 149.7, 141.3, 96.4, 141.8, 147.7, 120.9, 129.0, 

132.2, 55.3, 109.1, 90.1, 50.7, 111.9, 89.6, 102.6, 44.3, 61.6, 117.5, 106.5, 

58.2, 31.6, 49.8, 16.8, 25.1, 46.9, 70.6, 64.8, 55.0, 78.1, 71.4, 79.7, 115.1, 
130.6, 135.3, 165.1, 146.2, 78.2, 107.3, 131.9, 143.1, 121.8, 121.8, 108.6, 

97.2, 128.2, 134.6 

13.5.3 (b) 176.0, 161.6, 137.5, 82.1, 45.5, 77.8, 101.9, 90.5, 81.3, 104.6, 124.1, 
104.0, 82.1, 99.1, 114.0 



Review Questions 


13.6.1 163.17; 150.40; 137.63 

13.6.3 496.46; 140.72, 109.67, 87.76, 158.13 

13.8.1 (a) 166.67, 116.67, 133.33, 130.00 (b) 132.00 (c) 132.6 (d) 130.2 (e) 132.6 


15. (b ) y, = 2.10 + 0.08* 

(c) (e) (d) 

Vi Cyclical relatives 3-year moving average 




1.62 

100.00 


1.70 

100.00 

1.69 

1.78 

98.31 

1.77 

1.86 

99.46 

1.87 

1.94 

103.09 

1.98 

2.02 

103.96 

2.07 

2.10 

100.00 

2.13 

2.18 

100.92 

2.20 

2.26 

101.77 

2.27 

2.34 

99.15 

2.35 

2.42 

100.00 

2.41 

2.50 

99.20 

2.47 

2.58 

96.90 


17. (b) y t 

= 6.53 - 0.42* 


(c) 

(©) 

(d) 

Yt 

Cyclical relatives 

3-year moving average 

9.47 

101.37 


9.05 

104.97 

9.37 

8.63 

104.29 

9.00 

8.21 

103.53 

8.40 

7.79 

98.84 

7.73 

7.37 

94.98 

7.07 

6.95 

93.53 

6.50 

6.53 

91.88 

6.10 

6.11 

94.93 

5.73 

5.69 

94.90 

5.43 

5.27 

96.77 

5.07 

4.85 

96.91 

4.73 

4.43 

99.32 

4.53 

4.01 

112.22 

4.40 

3.59 

119.78 



CHAPTER 14 

14.6.1 (a) 30,500 (b) 50.83 (c) 47,700 (d) 0.13 (e) 50.120, 51.546 (f) 30,072; 
30,928 

14.6.3 252.3, 262.0, 176.622, 183,378 

14.7.1 (a) 1105 (b) 870 (c) 4.42 (d) 0.01392 (e) 1047, 1163 (f) 4.19, 4.65 

14.9.1 (a) $2500 (b) $2950 (c) $1480 (d) $1300 

14.9.3 (a) 61 (b) 11 

14.9.5 (a) 17 (b) 146, 87, 58 (c) 23, 57, 119 (d) 14, 51, 129 

11. x cl = 18 VQc ti ) = 0.0525 17.55, 18.45 

13. * 5/ = $1127.50 $1113.78; $1141.22 


Review Questions 



CHAPTER 15 


15.2.1 

15.2.3 

15.3.1 


(a) Do not sponsor program, (b) Sponsor program, (c) Sponsor program, 
(d) Sponsor program, (e) Sponsor program. 

(a) A 2 (b) A 2 (c) A 3 (d )A 3 (e) A 2 

(a) $8500, $13,500, $22,500 (b) $8500 (c) c, (d) c, (e) $55,000 (f) Yes, 
$8500 (g) c, (h) $8500 


CHAPTER 16 

16.2.1 Sample # x R 


41 

1.25 

58 


42 

-1.75 

6 


43 

-1.25 

12 


44 

-11.75 

76 


45 

-2.00 

10 


46 

-3.75 

15 


47 

-12.25 

8 


48 

-1.00 

5 


49 

8.25 

5 


50 

4.00 

7 


p = 0.13 LCL 

= 0 (cannot be negative) UCL = 0.27 


(a) UCL revised 

= 0.3675, LCL revised = 0.0291 


^ Time period 

P 

Time period 

P 

26 

0.12 

39 

0.18 

27 

0.20 

40 

0.24 

28 

0.14 

41 

0.12 

29 

0.38 

42 

0.06 

30 

0.08 

43 

0.22 

31 

0.16 

44 

0.16 

32 

0.22 

45 

0.18 

33 

0.14 

46 

0.20 

34 

0.18 

47 

0.18 

35 

0.10 

48 

0.40 

36 

0.20 

49 

0.12 

37 

0.26 

50 

0.20 

38 

0.36 




Values for samples 29 and 48 fall outside the control limits. Therefore the process 
does not remain under control. 

16.4.1 (a) 80, 0, 1 (b) 80, 5, 6 (c) 80, 14, 15 (d) 80, 3, 4 (e) 80, 7, 8 

16.5.1 Select 28 items; if the mean psi at failure is greater than or equal to 2681.09 
accept the lot. If the mean psi at failure is less than 2681.09, reject the lot. 

16.5.3 Since 3.94 > 3.91, reject the lot. 



Review Questions 


17 . (a) p = 0.1252 LCL - 0.0259 UCL = 0.2245 (c) Only sample 23 is outside 
the control limits. 

19 . n = 194 K = 16.02 Select a sample of size 194. If x < 16.02, classify the day’s 
operation as unacceptable. If x > 16.02, classify it as acceptable. 



Index 


Acceptable quality level (AQL), 556 
Acceptance region, 197 
Acceptance sampling, 553-570 
for attributes, 553-560 
by variables, 560-570 
Addition rule, 61 
Additive model, 444 
Aggregative price indexes, 475-479 
Alternative act, 511 
Alternative hypothesis, 193 
Among-groups sum of squares, 258 
Analysis of variance, 250-299 
definition, 251 
one way, 253 

one way by ranks, 424-427 
two way, 269 

two way by ranks, 427-430 
a posteriori comparisons, 265 
a priori comparisons, 265 
Arithmetic mean, 21, 23 
Autocorrelation, 322 
Average deviation, 26, 27 
Average outgoing quality curve, 559 
Average sample number curve, 559 
Average total inspection curve, 559 

Bayes’ criterion, 517 
application, 521-531 
Bayes’ theorem, 65-69 
Bernoulli process, 80 
Binomial distribution, 80-88 
mean of, 85 
table of, 590-619 
variance of, 85 

Bivariate normal distribution, 328 
(3 , confidence interval for, 318 
hypothesis test for, 314-318 

Central limit theorem, 127 
Central tendency, 21-24 
measures of, 21-24 


Chebyshev’s theorem, 29-30, 85 
Chi-square distribution, 177, 375^405 
goodness-of-fit test, 378-386 
small expected frequencies, 382 
mathematical properties, 376-378 
tabic of, 628 
test of homogeneity, 393 

small expected frequencies, 395 
test of independence, 386-393 
small expected frequencies, 389 
Class intervals, 12, 13, 15, 16 
Cluster, 485 

Cluster sampling, 485, 491-496 
sample size, 502 

Coefficient of determination, 319-321 
interpretation of, 319, 320 
Coefficient of multiple determination, 351, 352 
Combination, 52 
Complement, 46 
Complementary events, 64 
Completely randomized design, 252-264 
Composite indexes, 475 
Computers, 8, 9 
in analysis of variance, 268 
in multiple regression, 357 
in simple regression analysis, 326, 327 
Conditional probability, 59-61 
Confidence band, in simple linear regression, 325, 
326 

Confidence coefficient, 153 
Confidence interval, practical interpretation, 154 
probabilistic interpretation, 154 
see also Interval estimation 

Confidence intervals for mean of Y, multiple regression, 
356-357 

simple regression, 325 
Confidence limits, 152 
Consistency, 150 
Consumer Price Index, 474 
Contingency table, 386 
Continuous variable, 10 



Control charts, 539-553 
attributes, 546-553 
variables, 539-546 
Convenience sample, 504 
Correlation analysis, 301, 328-335 
rank, 430-433 

Correlation coefficient, confidence interval for, 333, 334 
hypothesis test for, 330-333 
Correlation coefficient, simple, 330-335 
multiple, 359 

Correlation model, simple, 328, 329 
multiple, 359-362 
Counting techniques, 48-55 
Critical value, 197 
Cumulative distribution, 76 
Cumulative relative frequency distribution, 15 
Curvilinear regression, 364 
Cyclical irregulars, 464 
Cyclical variation, 443, 462-468 

Decision rule, 197 
Decision theory, 509-537 
Decision theory and classical inference, 535, 536 
Dependent variable, 303 
Descriptive measures, 20-38 
computed from grouped data, 30-38 
Descriptive statistics, 8 
Deseasonalized data, 460 
Discrete variable, 10 
Dispersion, 24-29 
measures of, 24—29 
computed from grouped data, 32 
Distribution-free statistics, 407 
Doolittle method, 349 
Dummy variables, 364 

Efficiency, 150, 151, 291 
Empty set, 45 
Entity, 9 

Erratic variation, 443 
Error sum of squares, 258, 259 
simple linear regression, 312 
Estimation, 146 
Estimator, 147-151 
Event, 44, 56, 511 
equally likely, 56 
independent, 62 
Expected value, 78 


Experiment, 44 
Experimental unit, 252 
Explained deviation, 311 

Explained sum of squares, simple linear regression, 312 
Explanatory variables, 345 
Exponential smoothing, 472 
Extrapolation, 336 

F distribution, 180 
table of, 629-638 
F test, 230 
Factorial, 49 

Factorial experiment, 281-290 
advantages, 283, 284 

Finite population correction factor, 130, 488 
Fisher’s z, 332-334 
table of, 641 
Forecasting, 468-473 
Frequency distribution, 12-16 
Frequency polygon, 17, 18 
Friedman test, 427-430 
table for, 648, 649 
Function, 9 

Gap, 485 

Gaussian distribution, see Normal distribution 
Gauss multiplier, 353 

Goodness-of-fit test, see Chi-square, goodness-of-fit test 
Grouped data 
coding in, 35-37 

descriptive measures computed from, 30-38 

Histogram, 16, 17 
Hotelling’s transformation, 333 
Hurwicz criterion, 517 
Hypergeometric distribution, 92-96 
mean, 96 
variance, 96 
Hyperplane, 345 
Hypothesis, 192 
alternative, 193 
null, 193 

Hypothesis testing, 191-249 
for difference between means of two normally 
distributed populations, 215-220 
equal population variances, 217, 218 
known population variances, 216, 217 
unequal population variances, 218, 219 


/ 



unknown population variances, 217 
for difference between means of two nonnormally 
distributed populations, 220-223 
for difference between two population proportions, 
225-228 

for mean of a nonnormally distributed population, 
213-215 

for mean of a normally distributed population—known 
population variance, 200-208 
for mean of a normally distributed 

population—unknown population variance, 

208-213 

for population proportion, 223, 224 

for ratio of the variances of two populations, 230-232 

relation to interval estimation, 204 

steps in, 193-199 

for variance of a population, 228-230 

Independent variable, 303 
Index numbers, 474-479 
Inferential statistics, 8 
Interaction, 281-283 
Intersection, 46 
Interval estimate, 151 

practical interpretation of, 154 
probabilistic interpretation of, 154 
Interval estimation, 145-190 

of difference between population proportions, 170, 171 
of difference between two means, nonnormally 
distributed populations, 156 
of difference between two means—unequal population 
variances, 166-167 

of difference between two population means—known 
population variances, 162-164 
of difference between two population means—unknown 
population variances, 164-168 
of mean difference, 160-162 
of population mean—known population variance, 
151-157 

of population mean—nonnormally distributed 
population, 156 

of population mean—unknown population variance, 
157-162 

of population proportion, 168, 169 
of population variance, 177-180 
of ratio of the variances of two populations, 180-183 
Interval scale, 409 
Irregular variation, 443 


Joint distribution, 328 
Joint probability, 63 
Judgment sample, 504 

Kruskal-Wallis test, 424-427 
table for, 646, 647 

Laspeyres index, 476, 477 
Latin-square design, 275-280 
Least squares, method of, 306 
in multiple regression, 345 
Level of significance, 196 
Likelihood, 66 
Location parameters, 24 


Management information system, 1 
Mann-Whitney test, 415-419 
table for, 644, 645 
Maximax criterion, 516 
Maximin criterion, 515 
Mean, 21 

grouped data, 31, 32 
shortcut formula, 36 
properties of, 23 
Measurement, 408 
Measurement scales, 408, 409 
Measures of central tendency, 21-24 
Measures of dispersion, 24-29 
Median, 23 
grouped data, 33, 34 
properties of, 24 

Method of least squares, 306, 445 
Method of semiaverages, 445 
Minimax criterion, 516 
Missing data, 291 
Modal class, 34 
Mode, 24 
grouped data, 34 
Moving average, 452, 453 
Multicollincarity, 366 
Multiple correlation 
analysis, 359-362 
model, 359, 360 
Multiple regression 
analysis, 343-374 
equation, 345 
model, 344, 345 



Multiplication rule, 62 
Multiplicative model, 444 
Mutually exclusive events, 58 

Nominal scale, 408 

Nonparametric regression, see Simple linear regression, 
nonparametric 

Nonparametric statistics, 406-441 
advantages and disadvantages, 409, 410 
when to use, 407, 408 
Nonprobability sampling, 119, 504 
Normal approximation to the binomial distribution, 108 
Normal distribution, 100-112 
applications, 106 
characteristics, 100-102 
table of, 624, 625 
Null hypothesis, 193 
Null set, 45 

Observational unit, 485 
Ogive, 19 
One-sided test, 204 
One-way analysis of variance, 253 
and the t distribution, 262, 263 
Open-end class intervals, 16 
Operating characteristic curve, 236 
Ordered array, 11 
Ordinal scale, 408, 409 

p value, 206-210 
advantages of, 207 
calculation of, 206, 207, 209, 210 
Paasche’s index, 477, 478 
Paired comparison test, 210-212 
Paired difference test, see Paired comparison test 
Paired observations, 160, 161,210-212 
Pairwise comparisons, 265 
Parameter, 20 

Partial correlation, 360-362 
Partial regression coefficients, 346 
inference procedures, 353-355 
Payoff, 511 
Payoff table, 511 
Percentile, 34, 35 
Permutation, 49 

of objects that are not all different, 53 
Plane, regression, 345 
Point estimate, 147 


Poisson distribution, 88-92 
table of, 620-623 
Poisson process, 89, 90 
Population, 10, 485 
ordered, 497 
periodic, 497 
random, 497 
sampled, 146, 147 
target, 146, 147 
Posterior analysis, 528 
Posterior probabilities, 67, 522 
Post hoc comparisons, 265 
Power, 232-236 
Power curve, 234—236 
Prediction interval 
multiple regression, 355, 356 
simple linear regression, 324, 325 
Predictor variables, 345 
Preference theory, 532 
Preposterior analysis, 523 
Price indexes, 474-479 
Prior analysis, 522 
Prior probabilities, 66, 522 
Probability, 44-73 
a posteriori, 55 
a priori , 55 
classical, 55 
conditional, 58, 59 
definition, 44 
marginal, 61 
objective, 55 
properties, 57 

relative frequency concept, 55 
subjective, 57 
unconditional, 58 
Probability density function, 99 
Probability distribution 
continuous random variable, 96-112 
discrete random variable, 75-80 
mean, 78 
variance, 78 
Probability function, 9 
Probability sampling, 485 
Pseudorandom numbers, 122 

Qualitative variable, 10 
Quality control, 538-571 
Quantitative variable, 9 


Quantity indexes, 475 
Quartile, 34, 35 
Quota sample, 504 

Randomized complete block design, 269-275 
Randomized numbers, 120-122 
table of, 626 
Random sample, 10 
Random variable, 9 
Random variation, 443 
Range, 26 
Ratio scale, 409 

Regression analysis, see also Multiple regression; Simple 
linear regression 
simple linear, 300-342 
equation, 306 
multiple, 358 , 

Regression coefficient, 302 
Regression constant, 302 
Regression surface, 345 
Rejection region, 197 
Relative frequency distribution, 14 
Reliability coefficient, 153 
Repeated systematic sampling, 497 
Residual plots, 321-323 
Runs test, 410-412 
table for, 642 

Sample, 10, 485 
nonprobability, 119 
probability, 119 
random, 10 
simple random, 120 
Sample size, 485 
for estimating means, 171-175 
for estimating proportions, 175-177 
to control Type I and Type II errors, 237-239 
Sample surveys 
costs, 500-504 
efficiency, 500-504 
size, 500-504 
steps in, 486 

Sampled population, 146, 147 
Sampling 

simple random, 119-122 
with replacement, 120 
without replacement, 120 
Sampling distributions, 122 


construction of, 122, 123 
of difference between sample means, 133-137 
of difference between sample proportions, 140, 141 
of sample mean, 123-133 
of sample proportion, 137-139 
Sampling fraction, 485 
Sampling plan 
double, 554 
known sigma, 561 
multiple, 554 
single, 554 
unknown sigma, 564 
Sampling unit, 485 
Scatter diagram, 305 
Scientific method, 2 
Seasonal variation, 443, 454-462 
Secular trend, 443, 445-452 
Serial correlation, 322 
Set, 44 

conjoint sets, 45 
disjoint sets, 46 
empty, 45 
equal sets, 45 
intersection, 46 
union, 45 

Significance level, 195-197 
Sign test, 420-424 

Simple correlation coefficients, 330-335 
Simple linear regression 
model, 302 

nonparametric, 433-436 
Simple random sampling, 120 
definition, 120 
sample size, 171-177, 501 
Slope, 306 

confidence interval for, 318, 319 
Spearman rank correlation, 430-433 
table for, 650 
Special studies, 2-6 
steps involved in, 4-6 
Standard deviation, 28, 29 
grouped data, 32, 33 
Standard normal distribution, 102 
States of nature, 511 
Statistic, 20 

Statistical inference, 146 
Statistical programs, computer, 8 
Statistics, role in decision making, 2 




Step-down procedure, 363 
Step-up procedure, 363 
Stratified sampling, 485-491 
sample size, 501, 502 
Studentized range, table of, 639, 640 
Student’s / distribution, 157-160 
table of, 627 
Sturges’ rule, 13 
Subset, 45 
Sufficiency, 151 
Sum of squares 

due to multiple regression, 351 
error, multiple regression, 352 
total, multiple regression, 352 
Survey sampling, 482-508 
applications, 484 
Systematic sampling, 496-500 
sample size, 502 


Two-sided test, 202 
Two-way analysis of variance, 269 
Type I error, 195, 196 
Type II error, 195, 196 

Unbiasedness, 147 
Unexplained deviation, 311 
Union, 45 

Unit of association, 305 
Unit scale, 35 
Unit set, 45 
Universal set, 45 
Universe, 485 
Utility function, 533 
Utility index, 532 
Utility theory, 531-536 
assumptions, 535 

Unweighted aggregative price index, 475, 476 


/distribution, 157, 158 
properties, 158 
Target population, 146, 147 
Test statistic, 194, 195 
Time-series analysis, 442-473 
components of, 443 
Total deviation, 311 
Total sum of squares 
one-way ANOVA, 258 
simple linear regression, 312 
Transformations, 291, 365 
logarithmic, 366 
reciprocal, 366 
square-root, 366 
Treatment, 252 
Tree diagram, 49 
Tukey’s HSD test, 265-267 
2x2 contingency table, 390 


Value indexes, 471 
Variable, 9 
Variance, 27-30 
grouped data, 32, 33 
shortcut formula, 37 
Variance ratio test, 230 
Variate, 9 
Variation, 25 
Venn diagram, 45 

Weighted aggregative index, 476, 477 
Weighted arithmetic mean of relatives index, 478 
Wilcoxon signed-ranks test, 412-415 
table for, 643 

Y intercept, 306 
Yates’ correction, 391 


